how to "spread" a list-column?

Asked 12/8, 2018 at 22:42 Answered 15/12, 2023 at 17:16

Consider this simple example

mydf <- data_frame(regular_col = c(1,2),
                   normal_col = c('a','b'),
                   weird_col = list(list('hakuna', 'matata'),
                                 list('squash', 'banana')))

> mydf
# A tibble: 2 x 3
  regular_col normal_col weird_col 
        <dbl> <chr>      <list>    
1           1 a          <list [2]>
2           2 b          <list [2]>

I would like to extract the elements of weird_col (programmatically, the number of elements may change) so that each element is placed on a different column. That is, I expect the following output

> data_frame(regular_col = c(1,2),
+           normal_col = c('a','b'),
+           weirdo_one = c('hakuna', 'squash'),
+           weirdo_two = c('matata', 'banana'))
# A tibble: 2 x 4
  regular_col normal_col weirdo_one weirdo_two
        <dbl> <chr>      <chr>      <chr>     
1           1 a          hakuna     matata
2           2 b          squash     banana

However, I am unable to do so in simple terms. For instance, using the classic unnest fails here, as it expands the dataframe instead of placing each element of the list in a different column.

> mydf %>% unnest(weird_col)
# A tibble: 4 x 3
  regular_col normal_col weird_col
        <dbl> <chr>      <list>   
1           1 a          <chr [1]>
2           1 a          <chr [1]>
3           2 b          <chr [1]>
4           2 b          <chr [1]>

Is there any solution in the tidyverse for that?

Gladysglagolitic answered 12/8, 2018 at 22:42 Comment(8)

you know Swahili? – Polyhydric 12/8, 2018 at 22:45

I know the Lion King haha :) – Blakley 12/8, 2018 at 22:46

how deeply nested is the weird col? 2 levels or more? – Polyhydric 12/8, 2018 at 22:49

mydf%>%group_by(regular_col)%>%mutate(weird_col = invoke(paste,weird_col,collapse=","))%>%separate(weird_col,c("col1","col2"))

– Polyhydric 12/8, 2018 at 23:5

@Polyhydric pretty cool as well. what is the purpose of invoke here? – Blakley 12/8, 2018 at 23:36

invoke is exactly the same as do.call – Polyhydric 13/8, 2018 at 3:23

@Polyhydric asante sana :D – Blakley 13/8, 2018 at 11:53

invoke is similar to do.call (and it's a simple wrapper round it, if you look at the code), the main difference is that it has an additional ... argument, that @Polyhydric uses here to specify collapse="," – Southwest 13/8, 2018 at 13:7

You can extract the values from the output of unnest, process a little to make your column names, and then spread back out. Note that I use flatten_chr because of your depth-one list-column, but if it is nested you can use flatten and spread works just as well on list-cols.

library(tidyverse)
#> Warning: package 'dplyr' was built under R version 3.5.1
mydf <- data_frame(
  regular_col = c(1, 2),
  normal_col = c("a", "b"),
  weird_col = list(
    list("hakuna", "matata"),
    list("squash", "banana")
  )
)
mydf %>%
  unnest(weird_col) %>%
  group_by(regular_col, normal_col) %>%
  mutate(
    weird_col = flatten_chr(weird_col),
    weird_colname = str_c("weirdo_", row_number())
    ) %>% # or just as.character
  spread(weird_colname, weird_col)
#> # A tibble: 2 x 4
#> # Groups:   regular_col, normal_col [2]
#>   regular_col normal_col weirdo_1 weirdo_2
#>         <dbl> <chr>      <chr>    <chr>   
#> 1           1 a          hakuna   matata  
#> 2           2 b          squash   banana

Created on 2018-08-12 by the reprex package (v0.2.0).

Bimolecular answered 12/8, 2018 at 22:50 Comment(0)

unnest develops lists and vectors vertically, and one row data frames horizontally. So what we can do is change your lists into data frames (with adequate column names) and unnest afterwards.

mydf %>% mutate(weird_col = map(weird_col,~ as_data_frame(
  setNames(.,paste0("weirdo_",1:length(.)))
  ))) %>% 
  unnest

# # A tibble: 2 x 4
#   regular_col normal_col weirdo_1 weirdo_2
#         <dbl>      <chr>    <chr>    <chr>
# 1           1          a   hakuna   matata
# 2           2          b   squash   banana

Southwest answered 13/8, 2018 at 13:0 Comment(4)

pretty nice and concise! – Blakley 13/8, 2018 at 13:6

It's similar to what @Polyhydric suggested in the comments, except that he first transforms the list into a comma separated string, that will be spread horizontally with separate – Southwest 13/8, 2018 at 13:12

would this solution work if the number of elements in the list could vary in different rows? – Blakley 13/8, 2018 at 13:13

yes it would, try removing 'matata' from the first list, you would getNA in column weirdo_2 – Southwest 13/8, 2018 at 13:15

tidyr 1.0 introduced unnest_wider() for exactly this purpose.

library(dplyr)
library(tidyr)

mydf <- data_frame(regular_col = c(1,2),
                   normal_col = c('a','b'),
                   weird_col = list(list('hakuna', 'matata'),
                                    list('squash', 'banana')))

mydf %>% unnest_wider(weird_col, names_sep = '_')
#> # A tibble: 2 × 4
#>   regular_col normal_col weird_col_1 weird_col_2
#>         <dbl> <chr>      <chr>       <chr>      
#> 1           1 a          hakuna      matata     
#> 2           2 b          squash      banana

^{Created on 2023-12-15 with reprex v2.0.2}

Consecution answered 15/12, 2023 at 17:16 Comment(0)

Recommended topics

Hot tags