How to create data frame from list, selecting which sublist to focus on

Asked 16/8, 2024 at 22:38 Answered 17/8, 2024 at 8:11

Say that I have this list:

listexample = list(books = list(list(
                    title="Book 1",
                    entry = "entry 1",
                    publisher = "Books Unlimited",
                    authors = list(
                                list(name="bob", location="north dakota"),
                                list(name="susan", location="california"),
                                list(name="tim")),
                    isbn = "1358",
                    universities = list(
                                list(univ="univ1"),
                                list(univ="univ2"))
                    ),
                    list(
                        title="Book 2",
                        entry = "entry 2",
                        publisher = "Books Unified",
                        authors = list(
                            list(name="tom", location="north dakota"),
                            list(name="sally", location="california"),
                            list(name="erica", location="berlin")),
                        isbn = "1258",
                        universities = list(
                            list(univ="univ5"),
                            list(univ="univ2"),
                            list(univ="univ99"),
                            list(univ="univ2"),
                            list(univ="univ3"))
                    )   
                ),
     misc = list(name="Jim Smith", location="Alaska"))

How can I create a dataframe (or tibble is also fine) where each row is an author? I want to completely ignore the second element of the main list (misc). I also want to ignore universities, isbn, and publisher. I still want to keep title, name, location, as well as books (the name of the first element of the main list).

I know that rrapply can be used to iteratively do things, but I am not sure if it is appropriate in this case.

library(rrapply)
rrapply(listexample, how = "bind")

Handoff answered 16/8, 2024 at 22:38 Comment(1)

This looks ok as long as the list is not massive? You might want to run listexample[[2L]] = NULL first. And then it is just a matter of renaming and deleting rows I guess. – Rhythm 16/8, 2024 at 23:1

You can use unnest_longer and unnest_wider from tidyr.

listexample |> 
  tibble::enframe() |> 
  dplyr::filter(name == "books") |> 
  tidyr::unnest_longer(value) |> 
  tidyr::unnest_wider(value) |> 
  dplyr::select(title, authors) |> 
  tidyr::unnest_longer(authors) |> 
  tidyr::unnest_wider(authors)

You can run the code adding one line at a time to see what everything does. In short, we turn the list into a two-row tibble (row one is books, row two is misc), then expand the nested information.

Read the tidyr "rectangling" vignette for more information. In fact, you can probably reduce the code here by using tidyr::hoist().

Spindrift answered 16/8, 2024 at 23:9 Comment(1)

Thanks for flagging the vignette and for your answer. – Handoff 18/8, 2024 at 4:20

1) Use tibblify to create a tibble and from that select the title and authors columns. The latter is a list column so unnest it.

library(dplyr)
library(tidyr)
library(tibblify)

listexample %>%
  .$books %>%
  tibblify %>%
  select(title, authors) %>%
  unnest(authors)

giving

# A tibble: 6 × 3
  title  name  location    
  <chr>  <chr> <chr>       
1 Book 1 bob   north dakota
2 Book 1 susan california  
3 Book 1 tim   <NA>        
4 Book 2 tom   north dakota
5 Book 2 sally california  
6 Book 2 erica berlin

2) A variation of the above is to use tibblify with a specification as shown below. The specification can be created by running guess_tspec_df(listexample$books) and then editing that down to what is wanted.

spec <- tspec_df(
  tib_chr("title"),
  tib_df(
    "authors",
    tib_chr("name"),
    tib_chr("location", required = FALSE),
  )
)
tibblify(listexample$books, spec) %>% unnest(authors)

Sikhism answered 17/8, 2024 at 6:27 Comment(0)

You can use unnest_longer and unnest_wider from tidyr.

listexample |> 
  tibble::enframe() |> 
  dplyr::filter(name == "books") |> 
  tidyr::unnest_longer(value) |> 
  tidyr::unnest_wider(value) |> 
  dplyr::select(title, authors) |> 
  tidyr::unnest_longer(authors) |> 
  tidyr::unnest_wider(authors)

Read the tidyr "rectangling" vignette for more information. In fact, you can probably reduce the code here by using tidyr::hoist().

Spindrift answered 16/8, 2024 at 23:9 Comment(1)

Thanks for flagging the vignette and for your answer. – Handoff 18/8, 2024 at 4:20

In base R, using lapply and do.call you might do

> lapply(listexample[[1L]], \(i) { 
+   tmp = i[names(i) %in% c("authors", "title")] 
+   tmp2 = do.call("rbind", lapply(l<-tmp[["authors"]], `length<-`, max(lengths(l))))
+   cbind.data.frame("title" = rep(tmp[["title"]], nrow(tmp2)), tmp2)
+   }) |> do.call(what="rbind")

   title  name     location
1 Book 1   bob north dakota
2 Book 1 susan   california
3 Book 1   tim         NULL
4 Book 2   tom north dakota
5 Book 2 sally   california
6 Book 2 erica       berlin

Rhythm answered 17/8, 2024 at 8:11 Comment(0)

Recommended topics

Hot tags