html - Using readHTMLTable with multiple tbody -

- June 15, 2010

suppose have html table multiple <tbody>, we know legal html, , attempt read readhtmltable follows:

require(xml) table.text <- '<table>   <thead>     <tr><th>col1</th><th>col2</th>   </thead>   <tbody>     <tr><td>1a</td><td>2a</td></tr>   </tbody>   <tbody>     <tr><td>1b</td><td>2b</td></tr>   </tbody> </table>' readhtmltable(table.text)

the output takes first <tbody> element:

$`null`   col1 col2 1   1a   2a

and ignores rest. expected behavior? (i can't find mention in documentation.) , what flexible , robust ways access entire table?

i'm using

table.text <- gsub('</tbody>[[:space:]]*<tbody>', '', table.text) readhtmltable(table.text)

which prevents me using readhtmltable directly on url table this, , doesn't feel robust.

if @ source readhtmltable getmethod(readhtmltable, "xmlinternalelementnode") contains line

    if (length(tbody))          node = tbody[[1]]

so purposefully designed select content of first tbody. ?readhtmltable describes function providing

somewhat robust methods extracting data html tables in html document

it designed utility function. great when works may need hack around it.

Search This Blog

Sher

html - Using readHTMLTable with multiple tbody -

Comments

Post a Comment

Popular posts from this blog

java - How to Configure JAXRS and Spring With Annotations -

visual studio - TFS will not accept changes I've made to a Java project -

php - Create image in codeigniter on the fly -