ruby - Scraping data based on the text of other neighboring elements? -
i have code this:
<div id="left"> <div id="leftnav"> <div id="leftnavcontainer"> <div id="refinements"> <h2>department</h2> <ul id="ref_2975312011"> <li> <a href="#"> <span class="expand">pet supplies</span> </a> </li> <li> <strong>dogs</strong> </li> <li> <a> <span class="refinementlink">carriers & travel products</span> <span class="narrowvalue"> (5,570)</span> </a> </li> (etc...)
which i'm scriping this:
html = file data = nokogiri::html(open(html)) categories = data.css('#ref_2975312011') @categories_hash = {} categories.css('li').drop(2).each | categories | categories_title = categories.css('.refinementlink').text categories_count = categories.css('.narrowvalue').text[/[\d,]+/].delete(",").to_i @categories_hash[:categories] ||= {} @categories_hash[:categories]["dogs"] ||= {} @categories_hash[:categories]["dogs"][categories_title] = categories_count end
so now. want same without using #ref_2975312011
, "dogs".
so thinking tell nokogiri following:
scrap
li
elements (starting third one) right below li element has text pet supplies enclosed link , span tag.
any ideas of how accomplish that?
the pet supplies li
be:
puts doc.at('li:has(a span[text()="pet supplies"])')
the following sibling li
's (skipping first one):
puts doc.search('li:has(a span[text()="pet supplies"]) ~ li:gt(1)')
Comments
Post a Comment