Processing JSON using rjson
Asked Answered
D

2

3

I'm trying to process some data in JSON format. rjson::fromJSON imports the data successfully and places it into a quite unwieldy list.

library(rjson)
y <- fromJSON(file="http://api.lmiforall.org.uk/api/v1/wf/predict/breakdown/region?soc=6145&minYear=2014&maxYear=2020")
str(y)
List of 3
 $ soc                : num 6145
 $ breakdown          : chr "region"
 $ predictedEmployment:List of 7
  ..$ :List of 2
  .. ..$ year     : num 2014
  .. ..$ breakdown:List of 12
  .. .. ..$ :List of 3
  .. .. .. ..$ code      : num 1
  .. .. .. ..$ name      : chr "London"
  .. .. .. ..$ employment: num 74910
  .. .. ..$ :List of 3
  .. .. .. ..$ code      : num 7
  .. .. .. ..$ name      : chr "Yorkshire and the Humber"
  .. .. .. ..$ employment: num 61132
  ...

However, as this is essentially tabular data, I would like it in a succinct data.frame. After much trial and error I have the result:

y.p <- do.call(rbind,lapply(y[[3]], function(p) cbind(p$year,do.call(rbind,lapply(p$breakdown, function(q) data.frame(q$name,q$employment,stringsAsFactors=F))))))
head(y.p)
  p$year                   q.name q.employment
1   2014                   London     74909.59
2   2014 Yorkshire and the Humber     61131.62
3   2014     South West (England)     65833.57
4   2014                    Wales     33002.64
5   2014  West Midlands (England)     68695.34
6   2014     South East (England)     98407.36

But the command seems overly fiddly and complex. Is there a simpler way of doing this?

Dissection answered 16/7, 2013 at 10:51 Comment(0)
Z
2

I am not sure it is simpler, but the result is more complete and I think is easier to read. My idea using Map is, for each couple (year,breakdown), aggregate breakdown data into single table and then combine it with year.

dat <- y[[3]]
res <- Map(function(x,y)data.frame(year=y,
                                   do.call(rbind,lapply(x,as.data.frame))),
        lapply(dat,'[[','breakdown'),
        lapply(dat,'[[','year'))
## transform the list to a big data.frame
do.call(rbind,res)
   year code                     name employment
1  2014    1                   London   74909.59
2  2014    7 Yorkshire and the Humber   61131.62
3  2014    4     South West (England)   65833.57
4  2014   10                    Wales   33002.64
5  2014    5  West Midlands (England)   68695.34
6  2014    2     South East (England)   98407.36
Zincograph answered 16/7, 2013 at 11:28 Comment(6)
I've accepted this answer, as it is easier to read what is going on.Dissection
When I run this, e.g., lapply(dat, "[[", "breakdown") with dat = y I get Error in FUN(X[[1L]], ...) : subscript out of bounds ??Pompon
@MartinMorgan good catch.I edit my answer. I forget to mention that dat <- y[[3]]Zincograph
@Zincograph Thanks for your excellent answer! May I ask, I have never seen lapply used in the way you have here. Can you explain what's happening? I am used to lapply being of the form: lapply(<object>, <function>), but you seem to be pulling out some portion of the object. How does this work? The object, dat, does not even have 'breakdown' or 'year' at their highest level. I cannot seem to find anything within the documentation to explain this use case. Thanks so much!!Expiry
@MikeWilliamson you can see ?lapply is lapply(X, FUN, ...) where ... are optional arguments to FUN. So here my FUN is [[ to which I give column name (breakdown or year) as optional argument.Zincograph
@Zincograph Thanks for your reply! Yes, I understand that you can send a function to lapply. I just don't understand how [[ is captured as a function. For instance, I couldn't say [[(dat, 'breakdown') and expect any response. I see what it's doing: it's grabbing the subgroup within dat called 'breakdown'. In effect, it's doing dat$breakdown, or dat[["breakdown"]]. But I've never seen this usage and still don't quite get how / why it works.Expiry
P
5

Here I recover the geometry of the list

ni <- seq_along(y[[3]])
nj <- seq_along(y[[c(3, 1, 2)]])
nij <- as.matrix(expand.grid(3, ni=ni, 2, nj=nj))

then extract the relevant variable information using the rows of nij as an index into the nested list

data <- apply(nij, 1, function(ij) y[[ij]])
year <- apply(cbind(nij[,1:2], 1), 1, function(ij) y[[ij]])

and make it into a more friendly structure

> data.frame(year, do.call(rbind, data))
   year code                     name employment
1  2014    1                   London   74909.59
2  2015    5  West Midlands (England)   69132.34
3  2016   12         Northern Ireland   24313.94
4  2017    5  West Midlands (England)    71723.4
5  2018    9     North East (England)   27199.99
6  2019    4     South West (England)   71219.51
Pompon answered 16/7, 2013 at 12:19 Comment(1)
Hai @agstudy I tried to implement this method to our case, but I am still failed, this is my case #27227708 , I hope any someone can help meRoche
Z
2

I am not sure it is simpler, but the result is more complete and I think is easier to read. My idea using Map is, for each couple (year,breakdown), aggregate breakdown data into single table and then combine it with year.

dat <- y[[3]]
res <- Map(function(x,y)data.frame(year=y,
                                   do.call(rbind,lapply(x,as.data.frame))),
        lapply(dat,'[[','breakdown'),
        lapply(dat,'[[','year'))
## transform the list to a big data.frame
do.call(rbind,res)
   year code                     name employment
1  2014    1                   London   74909.59
2  2014    7 Yorkshire and the Humber   61131.62
3  2014    4     South West (England)   65833.57
4  2014   10                    Wales   33002.64
5  2014    5  West Midlands (England)   68695.34
6  2014    2     South East (England)   98407.36
Zincograph answered 16/7, 2013 at 11:28 Comment(6)
I've accepted this answer, as it is easier to read what is going on.Dissection
When I run this, e.g., lapply(dat, "[[", "breakdown") with dat = y I get Error in FUN(X[[1L]], ...) : subscript out of bounds ??Pompon
@MartinMorgan good catch.I edit my answer. I forget to mention that dat <- y[[3]]Zincograph
@Zincograph Thanks for your excellent answer! May I ask, I have never seen lapply used in the way you have here. Can you explain what's happening? I am used to lapply being of the form: lapply(<object>, <function>), but you seem to be pulling out some portion of the object. How does this work? The object, dat, does not even have 'breakdown' or 'year' at their highest level. I cannot seem to find anything within the documentation to explain this use case. Thanks so much!!Expiry
@MikeWilliamson you can see ?lapply is lapply(X, FUN, ...) where ... are optional arguments to FUN. So here my FUN is [[ to which I give column name (breakdown or year) as optional argument.Zincograph
@Zincograph Thanks for your reply! Yes, I understand that you can send a function to lapply. I just don't understand how [[ is captured as a function. For instance, I couldn't say [[(dat, 'breakdown') and expect any response. I see what it's doing: it's grabbing the subgroup within dat called 'breakdown'. In effect, it's doing dat$breakdown, or dat[["breakdown"]]. But I've never seen this usage and still don't quite get how / why it works.Expiry

© 2022 - 2024 — McMap. All rights reserved.