ruby - Scraping data based on the text of other neighboring elements? -


i have code this:

<div id="left">     <div id="leftnav">       <div id="leftnavcontainer">         <div id="refinements">           <h2>department</h2>            <ul id="ref_2975312011">             <li>               <a href="#">                 <span class="expand">pet supplies</span>               </a>             </li>              <li>               <strong>dogs</strong>             </li>              <li>               <a>                 <span class="refinementlink">carriers &amp; travel products</span>                 <span class="narrowvalue">&nbsp;(5,570)</span>               </a>             </li>   (etc...) 

which i'm scriping this:

html       = file data       = nokogiri::html(open(html)) categories = data.css('#ref_2975312011')  @categories_hash = {} categories.css('li').drop(2).each | categories |   categories_title = categories.css('.refinementlink').text   categories_count = categories.css('.narrowvalue').text[/[\d,]+/].delete(",").to_i   @categories_hash[:categories] ||= {}   @categories_hash[:categories]["dogs"] ||= {}   @categories_hash[:categories]["dogs"][categories_title] = categories_count end  

so now. want same without using #ref_2975312011 , "dogs".

so thinking tell nokogiri following:

scrap li elements (starting third one) right below li element has text pet supplies enclosed link , span tag.

any ideas of how accomplish that?

the pet supplies li be:

puts doc.at('li:has(a span[text()="pet supplies"])') 

the following sibling li's (skipping first one):

puts doc.search('li:has(a span[text()="pet supplies"]) ~ li:gt(1)') 

Comments

Popular posts from this blog

Detect support for Shoutcast ICY MP3 without navigator.userAgent in Firefox? -

web - SVG not rendering properly in Firefox -

java - JavaFX 2 slider labelFormatter not being used -