How to import JSON into R and convert it to table?
Asked Answered
A

3

10

I want to play with data that is now saved in JSON format. But I am very new to R and have little clue of how to play with data. You can see below what I managed to achieve. But first, my code:

library(rjson)
json_file <- "C:\\Users\\Saonkfas\\Desktop\\WOWPAPI\\wowpfinaljson.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

I was able to the data:

for (x in json_data){print (x)}

Although output looks pretty raw:

[[1]]
[[1]]$wins
[1] "118"

[[1]]$losses
[1] "40"
# And so on

Note that the JSON is somewhat nested. I could create tables with Python, but R seems much more complicated.

Edit:

My JSON:

{
"play1": [
    {
        "wins": "118",
        "losses": "40",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "4401",
        "max_ground_object_destroyed": "3"
    },
    {
        "wins": "100",
        "losses": "58",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "2401",
        "max_ground_object_destroyed": "3"
    },
    {
        "wins": "120",
        "losses": "38",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "2403",
        "max_ground_object_destroyed": "3"
    }
],

"play2": [
    {
        "wins": "12",
        "losses": "450",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "4401",
        "max_ground_object_destroyed": "3"
    },
    {
        "wins": "150",
        "losses": "8",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "2401",
        "max_ground_object_destroyed": "3"
    },
    {
        "wins": "120",
        "losses": "328",
        "max_killed": "7",
        "battles": "158",
        "plane_id": "2403",
        "max_ground_object_destroyed": "3"
    }
],
Agaric answered 4/1, 2014 at 19:12 Comment(2)
Post a portion of your json data. Also, try the RJSONIO package.Roobbie
and/or have a look at the ever so slightly newer rjsonlite package described here: public.opencpu.org/posts/jsonlite-a-smarter-json-encoderFatten
D
18

fromJSON returns a list, you can use the *apply functions to go through each element. It's fairly straightforward (once you know what to do!) to convert it to a "table" (data frame is the correct R terminology).

library(rjson)

# You can pass directly the filename
my.JSON <- fromJSON(file="test.json")

df <- lapply(my.JSON, function(play) # Loop through each "play"
  {
  # Convert each group to a data frame.
  # This assumes you have 6 elements each time
  data.frame(matrix(unlist(play), ncol=6, byrow=T))
  })

# Now you have a list of data frames, connect them together in
# one single dataframe
df <- do.call(rbind, df)

# Make column names nicer, remove row names
colnames(df) <- names(my.JSON[[1]][[1]])
rownames(df) <- NULL

df
  wins losses max_killed battles plane_id max_ground_object_destroyed
1  118     40          7     158     4401                           3
2  100     58          7     158     2401                           3
3  120     38          7     158     2403                           3
4   12    450          7     158     4401                           3
5  150      8          7     158     2401                           3
6  120    328          7     158     2403                           3
Dehart answered 4/1, 2014 at 20:31 Comment(0)
V
11

I find jsonlite to be a little more user friendly for this task. Here is a comparison of three JSON parsing packages (biased in favor of jsonlite)

library(jsonlite)
data <- fromJSON('path/to/file.json')

data
#> $play1
#   wins losses max_killed battles plane_id max_ground_object_destroyed
# 1  118     40          7     158     4401                           3
# 2  100     58          7     158     2401                           3
# 3  120     38          7     158     2403                           3
# 
# $play2
#   wins losses max_killed battles plane_id max_ground_object_destroyed
# 1   12    450          7     158     4401                           3
# 2  150      8          7     158     2401                           3
# 3  120    328          7     158     2403                           3

If you want to collapse those list names into a new column, I recommend dplyr::bind_rows rather than do.call(rbind, data)

library(dplyr)
data <- bind_rows(data, .id = 'play')

# Source: local data frame [6 x 7]

#    play  wins losses max_killed battles plane_id max_ground_object_destroyed
#   (chr) (chr)  (chr)      (chr)   (chr)    (chr)                       (chr)
# 1 play1   118     40          7     158     4401                           3
# 2 play1   100     58          7     158     2401                           3
# 3 play1   120     38          7     158     2403                           3
# 4 play2    12    450          7     158     4401                           3
# 5 play2   150      8          7     158     2401                           3
# 6 play2   120    328          7     158     2403                           3

Beware that the columns may not have the type you expect (notice the columns are all characters since all of the numbers were quoted in the provided JSON data)!

Edit Nov. 2017: One approach to type conversion would be to use mutate_if to guess the intended type of character columns.

data <- mutate_if(data, is.character, type.convert, as.is = TRUE)
Val answered 27/1, 2015 at 17:21 Comment(4)
Instead of loading reshape2 you could use rbind.pages from jsonlite, which may be able to go into one call with fromJSONUnassuming
@RichardScriven Neat! I hadn't noticed that function. Though in this case it does something similar in effect to do.call(rbind, data) in that it doesn't preserve the play# as a new variable (if that is important to the OP)Val
@Val what is the best way to get the correct column types? If I convert the json file to csv and import to R I get the correct types but importing straight with jsonlite gives me the wrong types.Charlet
@Charlet unfortunately I can't answer your question without knowing what your data looks like and what types are wrong. Though, it is probably easiest just to convert everything to a data frame and then coerce columns to the desired types if necessary.Val
B
4

I prefer tidyjson over rjson and jsonlite as it has a easy workflow for converting multilevel nested json objects to 2 dimensional tables. Your problem can be easily solved using this package from github.

devtools::install_github("sailthru/tidyjson")

library(tidyjson)
library(dplyr)

> json %>%  as.tbl_json %>% gather_keys %>% gather_array %>%  
+   spread_values(
+     wins = jstring("wins"),
+     losses = jstring("losses"),
+     max_killed = jstring("max_killed"),
+     battles = jstring("battles"),
+     plane_id = jstring("plane_id"),
+     max_ground_object_destroyed = jstring("max_ground_object_destroyed")
+    )

Output

  document.id   key array.index wins losses max_killed battles plane_id max_ground_object_destroyed
1           1 play1           1  118     40          7     158     4401                           3
2           1 play1           2  100     58          7     158     2401                           3
3           1 play1           3  120     38          7     158     2403                           3
4           1 play2           1   12    450          7     158     4401                           3
5           1 play2           2  150      8          7     158     2401                           3
6           1 play2           3  120    328          7     158     2403                           3
Broadloom answered 3/12, 2015 at 11:0 Comment(1)
How is variable json defined in your code? In the other examples this is loaded from disk.Desmonddesmoulins

© 2022 - 2024 — McMap. All rights reserved.