the code below works fine in interactive mode but fails when used in a function. it's pretty simply two authentications POST
commands followed by the data download. my goal is to get this working inside a function, not just in interactive mode.
this question is sort of a sequel to this question.. icpsr recently updated their website. the minimal reproducible example below requires a free account, available at
i tried adding Sys.sleep(1)
and various httr::GET
/httr::POST
calls but nothing worked.
my_download <-
function( your_email , your_password ){
values <-
list(
agree = "yes",
path = "ICPSR" ,
study = "21600" ,
ds = "" ,
bundle = "rdata",
dups = "yes",
email=your_email,
password=your_password
)
httr::POST("https://www.icpsr.umich.edu/cgi-bin/terms", body = values)
httr::POST("https://www.icpsr.umich.edu/rpxlogin", body = values)
tf <- tempfile()
httr::GET(
"https://www.icpsr.umich.edu/cgi-bin/bob/zipcart2" ,
query = values ,
httr::write_disk( tf , overwrite = TRUE ) ,
httr::progress()
)
}
# fails
my_download( "[email protected]" , "some_password" )
# stepping through works
debug( my_download )
my_download( "[email protected]" , "some_password" )
EDIT the failure simply downloads this page as if not logged in (and not the dataset), so it's losing the authentication for some reason. if you are logged in to icpsr, use private browsing to see the page--
https://www.icpsr.umich.edu/cgi-bin/bob/zipcart2?study=21600&ds=1&bundle=rdata&path=ICPSR
thanks!
robots.txt
is currently a bona fide technical control upheld in — at least U.S. — civil courts). Unless one has written permission to automate access, it's not a good idea to pursue this. – DressagePOST
andGET
commands twice triggers the download within the function. happy to award the bounty if you want to make that an answer. thanks very much! – Unicorn