How to read an html table using Rselenium?
Asked Answered
K

2

13

I'm using Rselenium to navigate to a webpage. The following code is doing so. I haven't provided the url because I'm using the url in a company which needs vpn to connect:

RSelenium::startServer()
require(RSelenium)
remDr <- remoteDriver()
remDr$navigate("some url")

After I navigate to the webpage, inside the html source I have the following table:

<font size="2">
<table border="1">
<tbody>
<tr>
<td> item1 </td>
<td> 0 </td>
<td> 0.05 </td>
<td> 2.43 </td>
<td align="center"> Pct </td>
<td align="center"> 1 </td>
</tr>
</tbody>
</table>

Now the question is how can I pull out the content of this table? Please assume the url is not existent, otherwise I can use an XML function: readHTMLTable(remDr$getCurrentUrl()). But this does not work for some reason. I need to use the remoteDriver handle (remDr) only. Thanks so much for your time

Koerlin answered 29/4, 2015 at 0:57 Comment(0)
B
20

Something like:

library(XML)
doc <- htmlParse(remDr$getPageSource()[[1]])
readHTMLTable(doc)

should allow you to access the html and process the tables contained.

Bedevil answered 29/4, 2015 at 1:53 Comment(1)
thanks much for your response @Bedevil . Since my table is very big its taking too much time. Is it possible to read only some of the cells, for example how can I read the number 2.43 in my html code above? Thanks againKoerlin
P
2

I prefer using rvest, so what I did was:

# Importing libraries
library(RSelenium)
library(rvest)

# Extracting table
remDr$getPageSource()[[1]] %>% 
  read_html() %>%
  html_table()
Possessory answered 18/5, 2020 at 12:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.