Successfully coercing paginated JSON object to R dataframe
Asked Answered
C

2

11

I am trying to convert JSON pulled from an API into a data frame in R, so that I can use and analyze the data.

#Install needed packages
require(RJSONIO)
require(httr)

#request a list of companies currently fundraising using httr
r <- GET("https://api.angel.co/1/startups?filter=raising")
#convert to text object using httr
raise <- content(r, as="text")
#convert to list using RJSONIO
fromJSON(raise) -> new

Once I get this object, new, I am having a really difficult time parsing the list into a dataframe. The json has this structure:

{
  "startups": [
 {
  "id": 6702,
  "name": "AngelList",
  "quality": 10,
  "...": "...",
  "fundraising": {
    "round_opened_at": "2013-07-30",
    "raising_amount": 1000000,
    "pre_money_valuation": 2000000,
    "discount": null,
    "equity_basis": "equity",
    "updated_at": "2013-07-30T08:14:40Z",
    "raised_amount": 0.0
      }
    }
  ],
  "total": 4268 ,
  "per_page": 50,
  "page": 1,
  "last_page": 86
}

I've tried looking at individual elements within new using code like:

 new$startups[[1]]$fundraising$raised_amount

To pull the raised_amount for the first element listed. However, I don't know how to apply this to the whole list of 4268 startups. In particular, I can't figure out how to deal with the pagination. I only ever seem to get one page of startups (i.e. 50 of them) max.

I tried using a for loop to get the list of startups and just put each value into a row of a dataframe one by one. The example below shows this for just one column, but of course I could do it for all of them just by expanding the for loop. However, I can't get any content on any of the other pages.

df1 <- as.data.frame(1:length(new$startups))
df1$raiseamnt <- 0

for (i in 1:length(new$startups)) {
  df1$raiseamnt[i] <- new$startups[[i]]$fundraising$raised_amount
}

e: Thank you for the mention of pagination. I will look through the documents more carefully and see if I can figure out how to correctly structure the API calls to get different pages. I will update this answer if/when I figure that out!

Cymatium answered 2/5, 2015 at 1:19 Comment(2)
From their API docs: "You may supply the page and per_page parameters to control pagination". What fields do you need? The data frame you've found can be simplified if you don't need the ones with columns that have data.frames or lists in them.Carcassonne
I do need the columns with dataframes and lists. I think misreading the api docs was my biggest problem -- I'll look into how to get pages correctly.Cymatium
G
12

You may find the jsonlite package useful. Below is a quick example.

library(jsonlite)
library(httr)
#request a list of companies currently fundraising using httr
r <- GET("https://api.angel.co/1/startups?filter=raising")
#convert to text object using httr
raise <- content(r, as="text")
#parse JSON
new <- fromJSON(raise)

head(new$startups$id)
[1] 229734 296470 237516 305916 184460 147385

Note, however, this package or the one in the question can be of help to parse JSON string, individual structure should created appropriately so that each element of the string can be added without a problem and it is up to the developer.

For pagnation, the API seems to be a REST API so that filtering condition is normally added in the URL (eg https://api.angel.co/1/startups?filter=raising&variable=value). I guess it would be found somewhere in the API doc.

Gainful answered 2/5, 2015 at 5:22 Comment(1)
what about: jsonlite::fromJSON(url)Barbee
P
8

httr library already imports jsonlite (httr documentation). The more elegant way with better formatted output is:

library(httr)    
resp <- httr::GET("https://api.angel.co/1/startups?filter=raising", accept_json())
cont <- content(resp, as = "parsed", type = "application/json")
#explicit convertion to data frame
dataFrame <- data.frame(cont)
Placia answered 21/9, 2016 at 12:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.