python - How to extract the URL from this HTML tag? -
i'm trying urls id='revsar'
html tag below, using python regex:
<a id='revsar' href='http://www.amazon.com/altec-lansing-inmotion-mobile-speaker/product-reviews/b000edkp8u/ref=cm_cr_dp_see_all_summary?ie=utf8&showviewpoints=1&sortby=byrankdescending' class='txtsmall notextdecoration'> see 136 customer reviews </a>
i tried code below, it's not working (it prints nothing):
regex = b'<a id="revsar" href="(.+?)" class="txtsmall notextdecoration">(.+?)</a>' pattern=re.compile(regex) rev_url=re.findall(pattern,txt) print ('reviews url: ' + str(rev_url))
you try like
(_, url), = re.findall(r'href=([\'"]*)(\s+)\1', input) print url
however, i'd rather use html parsing library beautifulsoup task this.
Comments
Post a Comment