Loop for Reverse Geocoding in R
Asked Answered
A

2

11

I am trying to reverse geocode a large data-set (around 100k). I used the revgeocode function from the package ggmap. I got the result for 1 entry

48 Grand View Terrace, San Francisco, 
CA 94114, USA            
48 Grand View Terrace Eureka Valley San Francisco        
San Francisco County                  California United States
postal_code postal_code_suffix

, but I need to automate the process and use it for the entire data-set.

I tried

r <- lapply(revgeocode(location = (c(z$lon),c(z$lat)),
             output = "more",
            messaging = FALSE, sensor = FALSE, override_limit = FALSE,
            client = "", signature = ""))

and got the errors for unexpected ',' in each step.

I tried to write the following loop too

r <- for(i in 1:10){
  revgeocode(location = ("z$lon", "z$lat"),output = "more", messaging =      FALSE, sensor = FALSE, override_limit = FALSE,client = "", signature = "")}

and got similar errors

Please provide some material or helpful links that will help me to write the loop for reverse geocoding. How to verify the authenticity of the data?

Auricle answered 9/5, 2016 at 13:48 Comment(2)
The google maps api limits to 2500 queries per day. So you may want to do something to take that into account.Rundown
@RickArko Yes, I am aware of that fact! Is there any way to call the api using this ggmap code? I think there is a payment of 0.5 USD for every 1000 queries.Auricle
C
8

Based on this answer, you could create new variables in your data.frame

We use mapply() to process your coordinates and return the results in a list res.

res <- mapply(FUN = function(lon, lat) { 
  revgeocode(c(lon, lat), output = "more") 
  }, 
  df$lon, df$lat
  )

Then, we use rbindlist() from data.table to convert the list into a data.frame (with fill = TRUE since not all elements of res have the same lenghts i.e. some results do not return a street_number and a postal_code) and cbind() it to the original data

cbind(df, data.table::rbindlist(res, fill = TRUE))

Update

Following up on your comment, if you want to process more than 2500 queries, you could subscribe to Google Maps APIs Premium Plan to unlock higher quotas. Then you can pass on your credentials to revgeocode() using the signature and client parameter.

As per mentionned in the documentation:

Upon purchasing your Google Maps APIs Premium Plan license, you will receive a welcome email from Google that contains your client ID and your private cryptographic key.

Your client ID is used to access the special features of Google Maps APIs Premium Plan. All client IDs begin with a gme- prefix. Pass your client ID as the value of the client parameter. A unique digital signature is generated using your private cryptographic key. Pass this signature as the value of the signature parameter.

You can see how it works under the hood by examining the revgeocode() source and see how the URL is constructed:

sensor4url <- paste("&sensor=", sensor, sep = "")
client4url <- paste("&client=", client, sep = "")
signature4url <- paste("&signature=", signature, sep = "")
url_string <- paste("http://maps.googleapis.com/maps/api/geocode/json?latlng=", 
        loc4url, sensor4url, sep = "")
    if (userType == "business") {
        url_string <- paste(url_string, client4url, signature4url, 
            sep = "")
    }

Data

df <- structure(list(lat = c(32.31, 32.19, 34.75, 35.09, 35.35, 34.74 ), lon = 
c(119.827, 119.637, 119.381, 119.364, 119.534, 119.421 )), .Names = 
c("lat", "lon"), row.names = c(21L, 32L, 37L, 48L, 50L, 89L), class = "data.frame") 
Chem answered 9/5, 2016 at 13:53 Comment(7)
I tried it for the example you provided and it works exactly fine!! thank you for the help! However, how do I get the large set into the format as prescribed by you? I am getting the error ` Error: is.numeric(location) && length(location) == 2 is not TRUE ` I tried to create the list by lst <-list(ll4$lontat)` where ll4 is the name of my data set and lonlat corresponds to a column containing (119.08,39.24) entries like these. Thank you!!Auricle
@AmitR.Pathak Please provide dput(head(ll4))Quillon
structure(list(lat = c(32.31, 32.19, 34.75, 35.09, 35.35, 34.74 ), lon = c(119.827, 119.637, 119.381, 119.364, 119.534, 119.421 )), .Names = c("lat", "lon"), row.names = c(21L, 32L, 37L, 48L, 50L, 89L), class = "data.frame")Auricle
The code is perfect!!! Thanks a lot!! Does the limit from the google APIs still apply? will the code stop functioning after 2500 queries? Many thanks!!Auricle
is there some way that it can be done by calling the api and paying for the reverse geocoding. I need it as early as possible.Auricle
Thank you @Steven ! I checked the google API page! What does the client id correspond to ? the api key or the OAuth Client ID ? I am not sure what signature means in this case. I am sorry but I am pretty new with this stuff.Auricle
I have posted on the google discussion forum, but have received no answer. Meanwhile, I have subscribed to the google cloud platform for a free trial and have received $300 for experimenting. I edited the code a bit and set the 'override_limit = TRUE'. This allows me to process beyond the 2500 queries per day, but do you have any idea as to how the billing will be processed? I ran the code for a sample of 500 extra than the 2500 and did not get any e-mail for payment from google. I cannot get the premium plan as it is meant only for enterprises.Auricle
A
0

I have had a similar problem for integrating the API key. Basically it is a matter of integrating the API key in the URL that R calls. If this doesn't help you, you need to change the core code (look it up on Github) to allow for an argument calling a key.

Alissaalistair answered 11/5, 2016 at 12:58 Comment(2)
getGeoData <- function(latlng, api_key){ geo_data <- getURL(paste("https://maps.googleapis.com/maps/api/geocode/json?","latlng=",latlng,"&key=",sep="")) geo_data <- fromJSON(geo_data) return(geo_data$results[[1]])} As suggested in the post, I wrote a function for reverse geocoding. I also tried to apply the following loop so that many queries can be processed for (i in 1:10) { geo_data[i] = getGeoData(unique(y1[i,4]))} . I get an error number of items to replace is not a multiple of replacement length Any suggestions to solve this?Auricle
I am quite a R newbie but shouldn't you write geo_data[[i]] insead? I'm very busy at work at the moment but I could have a look soonAlissaalistair

© 2022 - 2024 — McMap. All rights reserved.