python - Can't read line from html page -


i trying cut time format specific site. regex working (tried regex tester , worked), when try run code in python get:

import urllib,re  sock = urllib.urlopen("http://www.wolframalpha.com/input/?i=time") htmlsource = sock.read() sock.close() ips = re.findall( r'([01]?[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}',htmlsource) print ips 

the result:

>>> ['7', '4'] >>> 

the time on regextester.com marked red color want extract time in following format: xx:xx:xx (24h).

why happening? thank you!

you have redundant quantifiers in regexp (those {1}). can remove them.

another thing re.findall returning captures, hours. change first capture non-caturing group (?: ... ) , capture whole regex:

((?:[01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]) 

this should doing think.


Comments

Popular posts from this blog

Detect support for Shoutcast ICY MP3 without navigator.userAgent in Firefox? -

web - SVG not rendering properly in Firefox -

java - JavaFX 2 slider labelFormatter not being used -