python - How to extract the URL from this HTML tag? -

- September 15, 2012

i'm trying urls id='revsar' html tag below, using python regex:

<a id='revsar' href='http://www.amazon.com/altec-lansing-inmotion-mobile-speaker/product-reviews/b000edkp8u/ref=cm_cr_dp_see_all_summary?ie=utf8&showviewpoints=1&sortby=byrankdescending' class='txtsmall notextdecoration'>   see 136 customer reviews </a>

i tried code below, it's not working (it prints nothing):

regex = b'<a id="revsar" href="(.+?)" class="txtsmall notextdecoration">(.+?)</a>' pattern=re.compile(regex) rev_url=re.findall(pattern,txt) print ('reviews url: ' + str(rev_url))

you try like

(_, url), = re.findall(r'href=([\'"]*)(\s+)\1', input) print url

however, i'd rather use html parsing library beautifulsoup task this.

Search This Blog

Sher

python - How to extract the URL from this HTML tag? -

Comments

Post a Comment

Popular posts from this blog

java - How to Configure JAXRS and Spring With Annotations -

visual studio - TFS will not accept changes I've made to a Java project -

php - Create image in codeigniter on the fly -