python - Can't read line from html page -
i trying cut time format specific site. regex working (tried regex tester , worked), when try run code in python get:
import urllib,re sock = urllib.urlopen("http://www.wolframalpha.com/input/?i=time") htmlsource = sock.read() sock.close() ips = re.findall( r'([01]?[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}',htmlsource) print ips
the result:
>>> ['7', '4'] >>>
the time on regextester.com marked red color want extract time in following format: xx:xx:xx (24h).
why happening? thank you!
you have redundant quantifiers in regexp (those {1}
). can remove them.
another thing re.findall
returning captures, hours. change first capture non-caturing group (?: ... )
, capture whole regex:
((?:[01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9])
this should doing think.
Comments
Post a Comment