html - Using readHTMLTable with multiple tbody -


suppose have html table multiple <tbody>, we know legal html, , attempt read readhtmltable follows:

require(xml) table.text <- '<table>   <thead>     <tr><th>col1</th><th>col2</th>   </thead>   <tbody>     <tr><td>1a</td><td>2a</td></tr>   </tbody>   <tbody>     <tr><td>1b</td><td>2b</td></tr>   </tbody> </table>' readhtmltable(table.text) 

the output takes first <tbody> element:

$`null`   col1 col2 1   1a   2a 

and ignores rest. expected behavior? (i can't find mention in documentation.) , what flexible , robust ways access entire table?

i'm using

table.text <- gsub('</tbody>[[:space:]]*<tbody>', '', table.text) readhtmltable(table.text) 

which prevents me using readhtmltable directly on url table this, , doesn't feel robust.

if @ source readhtmltable getmethod(readhtmltable, "xmlinternalelementnode") contains line

    if (length(tbody))          node = tbody[[1]] 

so purposefully designed select content of first tbody. ?readhtmltable describes function providing

somewhat robust methods extracting data html tables in html document

it designed utility function. great when works may need hack around it.


Comments

Popular posts from this blog

Detect support for Shoutcast ICY MP3 without navigator.userAgent in Firefox? -

web - SVG not rendering properly in Firefox -

java - JavaFX 2 slider labelFormatter not being used -