python - How to extract the URL from this HTML tag? -


i'm trying urls id='revsar' html tag below, using python regex:

<a id='revsar' href='http://www.amazon.com/altec-lansing-inmotion-mobile-speaker/product-reviews/b000edkp8u/ref=cm_cr_dp_see_all_summary?ie=utf8&showviewpoints=1&sortby=byrankdescending' class='txtsmall notextdecoration'>   see 136 customer reviews </a> 

i tried code below, it's not working (it prints nothing):

regex = b'<a id="revsar" href="(.+?)" class="txtsmall notextdecoration">(.+?)</a>' pattern=re.compile(regex) rev_url=re.findall(pattern,txt) print ('reviews url: ' + str(rev_url)) 

you try like

(_, url), = re.findall(r'href=([\'"]*)(\s+)\1', input) print url 

however, i'd rather use html parsing library beautifulsoup task this.


Comments

Popular posts from this blog

Detect support for Shoutcast ICY MP3 without navigator.userAgent in Firefox? -

web - SVG not rendering properly in Firefox -

java - JavaFX 2 slider labelFormatter not being used -