I am using Selenium within R.
I have the following script which searches Google Maps for all pizza restaurants around a given geographical coordinate - and then keeps scrolling until all restaurants are loaded.
First, I navigate to the starting page:
library(RSelenium)
library(wdman)
library(netstat)
selenium()
seleium_object <- selenium(retcommand = T, check = F)
remote_driver <- rsDriver(browser = "chrome", chromever = "114.0.5735.90", verbose = F, port = free_port())
remDr<- remote_driver$client
lat <- 40.7484
lon <- -73.9857
# Create the URL using the paste function
URL <- paste0("https://www.google.com/maps/search/pizza/@", lat, ",", lon, ",17z/data=!3m1!4b1!4m6!2m5!3m4!2s", lat, ",", lon, "!4m2!1d", lon, "!2d", lat, "?entry=ttu")
# Navigate to the URL
remDr$navigate(URL)
Then, I use the following code to keep scrolling until all entries have been loaded:
# Waits 10 seconds for the elements to load before scrolling
elements <- remDr$findElements(using = "css selector", "div.qjESne")
while (TRUE) {
new_elements <- remDr$findElements(using = "css selector", "div.qjESne")
# Pick the last element in the list - this is the one we want to scroll to
last_element <- elements[[length(elements)]]
# Scroll to the last element
remDr$executeScript("arguments[0].scrollIntoView(true);", list(last_element))
Sys.sleep(10)
# Update the elements list
elements <- new_elements
# Check if there are any new elements loaded - the "You've reached the end of the list." message
if (length(remDr$findElements(using = "css selector", "span.HlvSq")) > 0) {
print("No more elements")
break
}
}
Finally, I use this code to extract the names and addresses of all restaurants:
titles <- c()
addresses <- c()
# Check if there are any new elements loaded - the "You've reached the end of the list." message
if (length(remDr$findElements(using = "css selector", "span.HlvSq")) > 0) {
# now we can parse the data since all the elements loaded
for (data in remDr$findElements(using = "css selector", "div.lI9IFe")) {
title <- data$findElement(using = "css selector", "div.qBF1Pd.fontHeadlineSmall")$getElementText()[[1]]
restaurant <- data$findElement(using = "css selector", ".W4Efsd > span:nth-of-type(2)")$getElementText()[[1]]
titles <- c(titles, title)
addresses <- c(addresses, restaurant)
}
# This converts the list of titles and addresses into a dataframe
df <- data.frame(title = titles, address = addresses)
print(df)
break
}
Instead of using Sys.sleep()
in R, I am trying to change my code such that only scrolls (i.e., delays the action) once the previous action has been completed. I am noticing that my existing code often freezes half way through and I suspect that this is because I am trying to load a new page when the existing page is not fully loaded. I think it might be better to somehow delay the action and wait for the page to be fully loaded prior to proceeding.
How might I be able to delay my script and force it to wait for the existing page to load before loading a new page? (e.g., R - Waiting for page to load in RSelenium with PhantomJS)
Note: I am also open to a Python solution.