Read a CSV from github into R
Asked Answered
O

10

70

I am trying to read a CSV from github into R:

latent.growth.data <- read.csv("https://github.com/aronlindberg/latent_growth_classes/blob/master/LGC_data.csv")

However, this gives me:

Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : unsupported URL scheme

I tried ?read.csv, ?download.file, getURL (which only returned strange HTML), as well as the data import manual, but still cannot understand how to make it work.

What am I doing wrong?

Orgiastic answered 21/1, 2013 at 15:20 Comment(0)
C
115

Try this:

library(RCurl)
x <- getURL("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")
y <- read.csv(text = x)

You have two problems:

  1. You're not linking to the "raw" text file, but Github's display version (visit the URL for https:\raw.github.com....csv to see the difference between the raw version and the display version).
  2. https is a problem for R in many cases, so you need to use a package like RCurl to get around it. In some cases (not with Github, though) you can simply replace https with http and things work out, so you can always try that out first, but I find using RCurl reliable and not too much extra typing.
Choroid answered 21/1, 2013 at 15:25 Comment(4)
How do you resolve Error in function (type, msg, asError = TRUE) : SSL certificate problem: unable to get local issuer certificate?Cottle
Can also be written as one line for memory/space purposes: y <- read.csv(text=getURL("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv"))Azazel
I tried this but it did not work. x <- getURL("https://github.com/eparker12/nCoV_tracker/blob/master/input_data/coronavirus_today.csv") y <- read.csv(text = x)Jerk
@Ben10, you're not using the raw URL. Can you try with that and see if it works?Choroid
H
26

From the documentation of url:

Note that ‘https://’ connections are not supported (with some exceptions on Windows).

So the problem is that R does not allow conncetions to https URL's.

You can use download.file with curl:

download.file("https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv", 
    destfile = "/tmp/test.csv", method = "curl")
Hydnocarpate answered 21/1, 2013 at 15:25 Comment(1)
@DirkEddelbuettel although it does depend on having Curl installedGentilesse
H
23

I am using R 3.0.2 and this code does the job.

urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(urlfile)

and this as well

urlfile<-'https://raw.github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
dsin<-read.csv(url(urlfile))

edit (sessionInfo)

R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.0.2
Hoedown answered 3/3, 2014 at 7:36 Comment(1)
Quote from ?url: Note that the ‘https://’ URL scheme is not supported except on Windows.Panties
K
15

In similar style to akhmed, I thought I would update the answer, since now you can just use Hadley's readr package. Just one thing to note: you'll need the url to be the raw content (see the //raw.git... below). Here's an example:

library(readr)
data <- read_csv("https://raw.githubusercontent.com/RobertMyles/Bayesian-Ideal-Point-IRT-Models/master/Senate_Example.csv")

Voilà!

Kos answered 14/5, 2016 at 0:28 Comment(0)
W
8

Realizing that the question is very old, Google still reported it as a top result (at least for me) so I decided to provide the answer for year 2015.

Folks are generally migrating now to curl package (including famous httr) as described by r-bloggers which offers the following very simple solution:

library(curl)

x <- read.csv( curl("https://raw.githubusercontent.com/trinker/dummy/master/data/gcircles.csv") )
Whiteman answered 2/7, 2015 at 7:57 Comment(0)
R
4

This is what I've been helping develop rio for. It's basically a universal data import/export package that supports HTTPS/SSL and infers the file type from its extension, thus allowing you to read basically anything using one import function:

library("rio")

If you grab the "raw" url for your CSV from Github, you can load it one line with import:

import("https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv")

The result is a data.frame:

     top100_repository_name   month monthly_increase monthly_begin_at monthly_end_with
1                    Bukkit 2012-03                9              431              440
2                    Bukkit 2012-04               19              438              457
3                    Bukkit 2012-05               19              455              474
4                    Bukkit 2012-06               18              475              493
5                    Bukkit 2012-07               15              492              507
6                    Bukkit 2012-08               50              506              556
...
Reimburse answered 25/2, 2015 at 19:23 Comment(3)
I try this and get get_ext(file) : file has no extensionBryce
@Bryce There was a small typo in the most recent Github version. Either install the older version from CRAN or reinstall from Github and it should work for you.Reimburse
Thanks - problem is fixed. Your solution is the only one that worked for me (Windows 8.1)Bryce
O
2

Seems nowadays GitHub wants you to go through their API to fetch content. I used the gh package as follows:

require(gh)

tmp = tempfile()
qurl = 'https://raw.githubusercontent.com/aronlindberg/latent_growth_classes/master/LGC_data.csv'
# download
gh(paste0('GET ', qurl), .destfile = tmp, .overwrite = TRUE)
# read
read.csv(tmp)

The important part is that you provide an personal access token (PAT). Either through the gh(.token = ) argument, or as I did, by setting the PAT globally in an ~/.Renviron file [1]. Of course you first have to create the PAT at your GitHub account.

[1] ~/.Renviron, I guess is searched first by all r-lib packages, as gh is one. The token therein should look like this:

GITHUB_PAT = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

You could also use the usethis package to set up the PAT.

Osseous answered 11/11, 2020 at 16:25 Comment(0)
I
0

curl might not work in windows at least for me

This is what worked for me in Windows

download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv", 
    destfile = "/tmp/test.csv",method="wininet")

In Linux

download.file("https://github.com/aronlindberg/latent_growth_classes/master/LGC_data.csv", 
    destfile = "/tmp/test.csv",method="curl")
Impolitic answered 26/8, 2015 at 1:44 Comment(0)
M
0

A rather dummy way... using copy/paste from clipboard

x <- read.table(file = "clipboard", sep = "t", header=TRUE)
Millymilman answered 26/3, 2019 at 14:13 Comment(0)
P
0

As mentioned by other postings, just go to the link for the raw code on github.

For example:

x <- read.csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-04-23/week4_australian_salary.csv")
Pantograph answered 29/5, 2021 at 17:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.