Download file with R given a JavaScript Statement

Asked 21/7, 2014 at 2:22 Answered 21/2, 2019 at 13:57

javascript r csv web-scraping export-to-csv

I want to create an R script that, among other things, downloads baseball player projection data from http://www.fangraphs.com/projections.aspx?pos=all&stats=bat&type=zips. There is a link to export this data to .csv on the webpage near the top right corner of the data table but it appears to be a JavaScript command (javascript:__doPostBack('ProjectionBoard1$cmdCSV',''). I am familiar with using download.file() using a link to a .csv file but am not sure how to approach this.

How can I use R to extract this data?

John answered 21/7, 2014 at 2:22 Comment(2)

It looks like you can just click "export data -> save" then use read.csv. – Dreadfully 21/7, 2014 at 2:24

Thanks, that will be what I do if need be. I was hoping there was a way I could have R download the file directly. I'm looking to run the script periodically as some of the data changes and would like to automate as much as possible. – John 21/7, 2014 at 2:43

The donwload isn't a simple response that can be easily retrieved with download.file. The web page constructs a FORM with some huge parameters that store the state of the web page, then pass this (and a load of cookies too) to the server to get the CSV response.

To make this work in R (or any other programming language) you need to construct that response, which you can usually only do by first getting the web page, scraping the FORM parameters (and cookies), then constructing the precise POST request you did when you clicked on the link.

This might be possible with RCurl, and it can sometimes be easier if you have a browser that can save the POST request parameter from its developer tools so you can then get RCurl to read them.

Another common technique in web scraping is to essentially run a browser that can be automated by a scripting language. There's an R package that leverages Selenium that might be able to do this:

http://cran.r-project.org/web/packages/RSelenium/index.html

There are some related (but not duplicate) Q's here, such as:

How to use R to download a zipped file from a SSL page that requires cookies

An R-help posting from a couple of years ago has some suggestions too:

https://stat.ethz.ch/pipermail/r-help//2012-September/335769.html

Lex answered 21/7, 2014 at 7:0 Comment(1)

As @Lex noted this is an asp form and the POST is complicated. You maybe able to replicate it using Curl. You can use Selenium thou the process is slightly involved as Selenium is not usually used for downloading files see #21944516 – Delamination 21/7, 2014 at 7:9

I had a similar problem trying to download several .pdf files. The solution I found is the following:

[1]. Get all .pdf links, like this one:

link <- "http://www.biblioteca.presidencia.gov.br/presidencia/ex-presidentes/luiz-inacio-lula-da-silva/discursos/1o-mandato/2003/01-01-pronun-do-presidente-da-republica-luiz-inacio-lula-da-silva-na-sessao-solene-de-posse-no-cn.pdf"

[2] Instead of using download.file() function, use browseURL(), like this:

browseURL(link, browser = getOption("browser"),
        encodeIfNeeded = FALSE)

[3] browseURL() function makes your browser open the file and it can automatically save the .pdf in your computer's download directory. If you are using Google Chrome, you can follow this steps:

https://www.computerhope.com/issues/ch001114.htm

Northrop answered 21/2, 2019 at 13:57 Comment(0)

Recommended topics

Hot tags