I'm trying to implement exception handling in RSelenium
and need help please. Please be aware that I have checked permissions to crawl this page with the robotstxt
package.
library(RSelenium)
library(XML)
library(janitor)
library(lubridate)
library(magrittr)
library(dplyr)
remDr <- remoteDriver(
remoteServerAddr = "192.168.99.100",
port = 4445L
)
remDr$open()
# Open TightVNC to follow along as RSelenium drives the browser
# navigate to the main page
remDr$navigate("https://docs.google.com/spreadsheets/d/1o1PlLIQS8v-XSuEz1eqZB80kcJk9xg5lsbueB7mTg1U/pub?output=html&widget=true#gid=690408156")
# look for table element
tableElem <- remDr$findElement(using = "id", "pageswitcher-content")
# switch to table
remDr$switchToFrame(tableElem)
# parse html for first table
doc <- htmlParse(remDr$getPageSource()[[1]])
table_tmp <- readHTMLTable(doc)
table_tmp <- table_tmp[[1]][-2, -1]
table_tmp <- table_tmp[-1, ]
colnames(table_tmp) <- c("team_name", "team_size", "start_time", "end_time", "total_time", "puzzels_solved")
table_tmp$city <- rep("montreal", nrow(table_tmp))
table_tmp$date <- rep(Sys.Date() - 5, nrow(table_tmp))
# switch back to the main/outer frame
remDr$switchToFrame(NULL)
# I found the elements I want to manipulate with Inspector mode in a browser
webElems <- remDr$findElements(using = "css", ".switcherItem") # Month/Year tabs at the bottom
arrowElems <- remDr$findElements(using = "css", ".switcherArrows") # Arrows to scroll left and right at the bottom
# Create NULL object to be used in for loop
big_df <- NULL
for (i in seq(length(webElems))) {
# choose the i'th Month/Year tab
webElem <- webElems[[i]]
webElem$clickElement()
tableElem <- remDr$findElement(using = "id", "pageswitcher-content") # The inner table frame
# switch to table frame
remDr$switchToFrame(tableElem)
Sys.sleep(3)
# parse html with XML package
doc <- htmlParse(remDr$getPageSource()[[1]])
Sys.sleep(3)
# Extract data from HTML table in HTML document
table_tmp <- readHTMLTable(doc)
Sys.sleep(3)
# put this into a format you can use
table <- table_tmp[[1]][-2, -1]
table <- table[-1, ]
# rename the columns
colnames(table) <- c("team_name", "team_size", "start_time", "end_time", "total_time", "puzzels_solved")
# add city name to a column
table$city <- rep("Montreal", nrow(table))
# add the Month/Year this table was extracted from
today <- Sys.Date() %m-% months(i + 1)
table$date <- today
# concatenate each table together
big_df <- dplyr::bind_rows(big_df, table)
# Switch back to main frame
remDr$switchToFrame(NULL)
################################################
### I should use exception handling here ###
################################################
}
When the browser gets to the January 2018
table it can no longer find the next webElems
element and throws and error:
Selenium message:Element is not currently visible and so may not be interacted with Build info: version: '2.53.1', revision: 'a36b8b1', time: '2016-06-30 17:37:03' System info: host: '617e51cbea11', ip: '172.17.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '4.14.79-boot2docker', java.version: '1.8.0_91' Driver info: driver.version: unknown
Error: Summary: ElementNotVisible Detail: An element command could not be completed because the element is not visible on the page. class: org.openqa.selenium.ElementNotVisibleException Further Details: run errorDetails method In addition: There were 50 or more warnings (use warnings() to see the first 50)
I've been dealing with it rather naively by including this code at the end of the for loop. This is not a good idea for two reasons: 1) the scrolling speed was finicky to figure out and would fail on other (longer) google pages, 2) the for loop eventually fails at the end when it tries to click the right arrow but it's already at the end - therefore it won't download the last few tables.
# click the right arrow to scroll right
arrowElem <- arrowElems[[1]]
# once you "click"" the element it is "held down" - no way to " unclick" to prevent it from scrolling too far
# I currently make sure it only scrolls a short distance - via Sys.sleep() before switching to outer frame
arrowElem$clickElement()
# give it "just enough time" to scroll right
Sys.sleep(0.3)
# switch back to outer frame to re-start the loop
remDr$switchToFrame(NULL)
What I would like to have happen is handle this exception by executing arrowElem$clickElement()
when this error pops up. I think one would typically use tryCatch()
; however, this is also my first time learning about exception handling. I thought I could include this in the remDr$switchToFrame(tableElem)
part of the for loop but it doesn't work:
tryCatch({
suppressMessages({
remDr$switchToFrame(tableElem)
})
},
error = function(e) {
arrowElem <- arrowElems[[1]]
arrowElem$clickElement()
Sys.sleep(0.3)
remDr$switchToFrame(NULL)
}
)
try()
function after# do stuff
? I'm guessing this didn't work in this case because of the auto-scroll of clicking the right arrow even though I tried to stop it via:arrowElem$clickElement() Sys.sleep(0.3) remDr$switchToFrame(NULL)
– Dwelt