Converting grouped tibble to named list
Asked Answered
E

6

9

I feel like there is probably a better way to do this in tidyverse than a for-loop. Start with a standard tibble/dataframe, and make a list where the name of the list elements are the unique values of one column (group_by?) and the list elements are all the values of another column.

  my_data <- tibble(list_names = c("Ford", "Chevy", "Ford", "Dodge", "Dodge", "Ford"),
                    list_values = c("Ranger", "Equinox", "F150", "Caravan", "Ram", "Explorer"))
  
# A tibble: 6 × 2
  list_names list_values
  <chr>      <chr>      
1 Ford       Ranger     
2 Chevy      Equinox    
3 Ford       F150       
4 Dodge      Caravan    
5 Dodge      Ram        
6 Ford       Explorer

This is the desired output:

  desired_output <- list(Ford = c("Ranger", "F150", "Explorer"),
       Chevy = c("Equinox"),
       Dodge = c("Caravan", "Ram"))

$Ford
[1] "Ranger"   "F150"     "Explorer"

$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram" 

It can be accomplished with a for-loop but I bet there is a tidyverse function that makes it more simple/faster, etc.

  desired_output <- list()
  for(i in seq_along(my_data$list_names)) {
    entry <- my_data %>%
      filter(list_names == my_data$list_names[i]) %>%
      pull(list_values)
    desired_output[[my_data$list_names[i]]] <- entry
  }
Emotionalize answered 13/1, 2022 at 17:15 Comment(0)
V
6

We can use split

with(my_data, split(list_values,
     factor(list_names, levels = unique(list_names))))
$Ford
[1] "Ranger"   "F150"     "Explorer"

$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"   

Or with unstack

unstack(my_data, list_values ~ list_names)
$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"    

$Ford
[1] "Ranger"   "F150"     "Explorer"
Vella answered 13/1, 2022 at 17:16 Comment(5)
my_data %>% unstack(list_values ~ list_names . Least verbose and most straightforward response.Emotionalize
@JeffParker the split is compact as well, if you don't want to order the values with factorVella
Good point. In my case, order doesn't matter. Great reference for others.Emotionalize
It does look like split is a little faster than unstack (just did a benchmark out of curiosity).Harbinger
@AndrewGillreath-Brown split is a fast function whereas stack/unstack is slowVella
H
7

Here is another option (though a little more verbose) using group_modify and deframe from tidyverse.

library(tidyverse)

my_data |>
  group_by(list_names)  |>
  group_modify(\(x, ...) tibble(res = list(deframe(x)))) |>
  deframe()

Or another option could be to use summarise then again use deframe:

my_data %>%
  group_by(list_names) %>%
  summarise(named_vec = list(list_values)) %>%
  deframe()

Output

$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"    

$Ford
[1] "Ranger"   "F150"     "Explorer"

Benchmark

I was curious to see what is fastest of the answers here, and definitely looks like split by @akrun is by far the fastest, followed by unstack.

enter image description here

bm <- microbenchmark::microbenchmark(
  akrun_split = with(my_data, split(list_values,
                                    factor(list_names, levels = unique(list_names)))),
  akrun_unstack = unstack(my_data, list_values ~ list_names),
  andrew_deframe1 = my_data |>
    group_by(list_names)  |>
    group_modify(\(x, ...) tibble(res = list(deframe(x)))) |>
    deframe(),
  andrew_deframe2 = my_data %>%
    group_by(list_names) %>%
    summarise(named_vec = list(list_values)) %>%
    deframe(),
  paulsmith = my_data %>% 
    group_by(list_names) %>% 
    summarise(list_values = list(list_values)) %>% 
    {set_names(.$list_values, .$list_names)}, 
  times=1000L
)
Harbinger answered 13/1, 2022 at 17:24 Comment(1)
One nice thing about the last solution is that it is easy to encode a third column as the names of the values. Suppose we add a lets column as that third column. Then my_data$lets <- head(letters); my_data %>% summarise(named_vec = list(setNames(list_values, lets)), .by = list_names) %>% deframeSomali
V
6

We can use split

with(my_data, split(list_values,
     factor(list_names, levels = unique(list_names))))
$Ford
[1] "Ranger"   "F150"     "Explorer"

$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"   

Or with unstack

unstack(my_data, list_values ~ list_names)
$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"    

$Ford
[1] "Ranger"   "F150"     "Explorer"
Vella answered 13/1, 2022 at 17:16 Comment(5)
my_data %>% unstack(list_values ~ list_names . Least verbose and most straightforward response.Emotionalize
@JeffParker the split is compact as well, if you don't want to order the values with factorVella
Good point. In my case, order doesn't matter. Great reference for others.Emotionalize
It does look like split is a little faster than unstack (just did a benchmark out of curiosity).Harbinger
@AndrewGillreath-Brown split is a fast function whereas stack/unstack is slowVella
E
4

Another possible solution:

library(tidyverse)

my_data <- tibble(list_names = c("Ford", "Chevy", "Ford", "Dodge", "Dodge", "Ford"),
                  list_values = c("Ranger", "Equinox", "F150", "Caravan", "Ram", "Explorer"))

my_data %>% 
  group_by(list_names) %>% 
  summarise(list_values = list(list_values)) %>% 
  {set_names(.$list_values, .$list_names)}

#> $Chevy
#> [1] "Equinox"
#> 
#> $Dodge
#> [1] "Caravan" "Ram"    
#> 
#> $Ford
#> [1] "Ranger"   "F150"     "Explorer"
Ezaria answered 13/1, 2022 at 17:34 Comment(0)
S
1

Update II: Changed tibble output to vector outouput. Thanks to Jeff Parker!

Update due to Jeff Parker's comment (please see comments). I now updated the code. The issue was setting Names as unsorted, after using sort we can use setNames correctly. Then I added map to apply dplyrs select to remove first column in each dataframe:

library(dplyr)
library(purrr)

my_data %>% 
  group_by(list_names) %>% 
  mutate(list_values= paste(list_values, collapse = ", ")) %>% 
  slice(1) %>% 
  group_split() %>% 
  setNames(sort(unique(my_data$list_names))) %>% 
  map(., dplyr::pull, -list_names) %>%
  map(., ~str_split(.x, ", ")[[1]] )
$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"    

$Ford
[1] "Ranger"   "F150"     "Explorer"
Supportive answered 13/1, 2022 at 17:26 Comment(4)
The list elements should be vectors, not dataframes. Also you are mixing Chevy with Dodge, and Dodge with Ford.Emotionalize
@JeffParker. Thank you very much for your note. I will clear that.Supportive
@JeffParker. Please see my update!Supportive
The list elements are still tibbles, but see my latest answer for a fix using dplyr::pull and map(., str_split)Emotionalize
E
1

Just for fun, added benchmarks with my for-loop and TarJae's answer.

bm <- microbenchmark::microbenchmark(
  akrun_split = with(my_data, split(list_values,
                                    factor(list_names, levels = unique(list_names)))),
  akrun_unstack = unstack(my_data, list_values ~ list_names),
  andrew_deframe1 = my_data |>
    group_by(list_names)  |>
    group_modify(\(x, ...) tibble(res = list(deframe(x)))) |>
    deframe(),
  andrew_deframe2 = my_data %>%
    group_by(list_names) %>%
    summarise(named_vec = list(list_values)) %>%
    deframe(),
  paulsmith = my_data %>% 
    group_by(list_names) %>% 
    summarise(list_values = list(list_values)) %>% 
    {set_names(.$list_values, .$list_names)},
  jeffs = {
    desired_output <- list()
    for(i in seq_along(my_data$list_names)) {
      entry <- my_data %>%
        filter(list_names == my_data$list_names[i]) %>%
        pull(list_values)
      desired_output[[my_data$list_names[i]]] <- entry
    }
    desired_output},
    TarJae = my_data %>% 
    group_by(list_names) %>% 
    mutate(list_values= paste(list_values, collapse = ", ")) %>% 
    slice(1) %>% 
    group_split() %>% 
    setNames(sort(unique(my_data$list_names))) %>% 
    map(., dplyr::pull, -list_names) %>%
    map(., ~str_split(.x, ", ")[[1]] ), 
  times=100L
)

enter image description here

I also ran benchmarks with a larger data set on the two fastest options (from Akrun).

library(nycflights13)
my_data <- nycflights13::flights %>%
  select(list_names = carrier, list_values = flight)

enter image description here

Emotionalize answered 13/1, 2022 at 21:18 Comment(0)
L
0

Another way of doing it:

library(tidyverse)

my_data %>%
  group_split(list_names) %>%
  map(~ lst(!!unique(pull(.x, list_names)) := unique(pull(.x, list_values)))) %>%
  flatten()
#> $Chevy
#> [1] "Equinox"
#> 
#> $Dodge
#> [1] "Caravan" "Ram"    
#> 
#> $Ford
#> [1] "Ranger"   "F150"     "Explorer"

Created on 2022-01-14 by the reprex package (v2.0.1)

Data:

my_data <- tibble(
  list_names = c("Ford", "Chevy", "Ford", "Dodge", "Dodge", "Ford"),
  list_values = c("Ranger", "Equinox", "F150", "Caravan", "Ram", "Explorer")
)
Logography answered 14/1, 2022 at 4:47 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.