how to "spread" a list-column?
Asked Answered
G

3

11

Consider this simple example

mydf <- data_frame(regular_col = c(1,2),
                   normal_col = c('a','b'),
                   weird_col = list(list('hakuna', 'matata'),
                                 list('squash', 'banana')))

> mydf
# A tibble: 2 x 3
  regular_col normal_col weird_col 
        <dbl> <chr>      <list>    
1           1 a          <list [2]>
2           2 b          <list [2]>

I would like to extract the elements of weird_col (programmatically, the number of elements may change) so that each element is placed on a different column. That is, I expect the following output

> data_frame(regular_col = c(1,2),
+           normal_col = c('a','b'),
+           weirdo_one = c('hakuna', 'squash'),
+           weirdo_two = c('matata', 'banana'))
# A tibble: 2 x 4
  regular_col normal_col weirdo_one weirdo_two
        <dbl> <chr>      <chr>      <chr>     
1           1 a          hakuna     matata
2           2 b          squash     banana    

However, I am unable to do so in simple terms. For instance, using the classic unnest fails here, as it expands the dataframe instead of placing each element of the list in a different column.

> mydf %>% unnest(weird_col)
# A tibble: 4 x 3
  regular_col normal_col weird_col
        <dbl> <chr>      <list>   
1           1 a          <chr [1]>
2           1 a          <chr [1]>
3           2 b          <chr [1]>
4           2 b          <chr [1]>

Is there any solution in the tidyverse for that?

Gladysglagolitic answered 12/8, 2018 at 22:42 Comment(8)
you know Swahili?Polyhydric
I know the Lion King haha :)Blakley
how deeply nested is the weird col? 2 levels or more?Polyhydric
mydf%>%group_by(regular_col)%>%mutate(weird_col = invoke(paste,weird_col,collapse=","))%>%separate(weird_col,c("col1","col2"))Polyhydric
@Polyhydric pretty cool as well. what is the purpose of invoke here?Blakley
invoke is exactly the same as do.callPolyhydric
@Polyhydric asante sana :DBlakley
invoke is similar to do.call (and it's a simple wrapper round it, if you look at the code), the main difference is that it has an additional ... argument, that @Polyhydric uses here to specify collapse=","Southwest
B
11

You can extract the values from the output of unnest, process a little to make your column names, and then spread back out. Note that I use flatten_chr because of your depth-one list-column, but if it is nested you can use flatten and spread works just as well on list-cols.

library(tidyverse)
#> Warning: package 'dplyr' was built under R version 3.5.1
mydf <- data_frame(
  regular_col = c(1, 2),
  normal_col = c("a", "b"),
  weird_col = list(
    list("hakuna", "matata"),
    list("squash", "banana")
  )
)
mydf %>%
  unnest(weird_col) %>%
  group_by(regular_col, normal_col) %>%
  mutate(
    weird_col = flatten_chr(weird_col),
    weird_colname = str_c("weirdo_", row_number())
    ) %>% # or just as.character
  spread(weird_colname, weird_col)
#> # A tibble: 2 x 4
#> # Groups:   regular_col, normal_col [2]
#>   regular_col normal_col weirdo_1 weirdo_2
#>         <dbl> <chr>      <chr>    <chr>   
#> 1           1 a          hakuna   matata  
#> 2           2 b          squash   banana

Created on 2018-08-12 by the reprex package (v0.2.0).

Bimolecular answered 12/8, 2018 at 22:50 Comment(0)
S
5

unnest develops lists and vectors vertically, and one row data frames horizontally. So what we can do is change your lists into data frames (with adequate column names) and unnest afterwards.

mydf %>% mutate(weird_col = map(weird_col,~ as_data_frame(
  setNames(.,paste0("weirdo_",1:length(.)))
  ))) %>% 
  unnest

# # A tibble: 2 x 4
#   regular_col normal_col weirdo_1 weirdo_2
#         <dbl>      <chr>    <chr>    <chr>
# 1           1          a   hakuna   matata
# 2           2          b   squash   banana
Southwest answered 13/8, 2018 at 13:0 Comment(4)
pretty nice and concise!Blakley
It's similar to what @Polyhydric suggested in the comments, except that he first transforms the list into a comma separated string, that will be spread horizontally with separateSouthwest
would this solution work if the number of elements in the list could vary in different rows?Blakley
yes it would, try removing 'matata' from the first list, you would getNA in column weirdo_2Southwest
C
1

tidyr 1.0 introduced unnest_wider() for exactly this purpose.

library(dplyr)
library(tidyr)

mydf <- data_frame(regular_col = c(1,2),
                   normal_col = c('a','b'),
                   weird_col = list(list('hakuna', 'matata'),
                                    list('squash', 'banana')))

mydf %>% unnest_wider(weird_col, names_sep = '_')
#> # A tibble: 2 × 4
#>   regular_col normal_col weird_col_1 weird_col_2
#>         <dbl> <chr>      <chr>       <chr>      
#> 1           1 a          hakuna      matata     
#> 2           2 b          squash      banana

Created on 2023-12-15 with reprex v2.0.2

Consecution answered 15/12, 2023 at 17:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.