Converting grouped tibble to named list

Asked 13/1, 2022 at 17:15 Answered 14/1, 2022 at 4:47

I feel like there is probably a better way to do this in tidyverse than a for-loop. Start with a standard tibble/dataframe, and make a list where the name of the list elements are the unique values of one column (group_by?) and the list elements are all the values of another column.

  my_data <- tibble(list_names = c("Ford", "Chevy", "Ford", "Dodge", "Dodge", "Ford"),
                    list_values = c("Ranger", "Equinox", "F150", "Caravan", "Ram", "Explorer"))
  
# A tibble: 6 × 2
  list_names list_values
  <chr>      <chr>      
1 Ford       Ranger     
2 Chevy      Equinox    
3 Ford       F150       
4 Dodge      Caravan    
5 Dodge      Ram        
6 Ford       Explorer

This is the desired output:

  desired_output <- list(Ford = c("Ranger", "F150", "Explorer"),
       Chevy = c("Equinox"),
       Dodge = c("Caravan", "Ram"))

$Ford
[1] "Ranger"   "F150"     "Explorer"

$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"

It can be accomplished with a for-loop but I bet there is a tidyverse function that makes it more simple/faster, etc.

  desired_output <- list()
  for(i in seq_along(my_data$list_names)) {
    entry <- my_data %>%
      filter(list_names == my_data$list_names[i]) %>%
      pull(list_values)
    desired_output[[my_data$list_names[i]]] <- entry
  }

Emotionalize answered 13/1, 2022 at 17:15 Comment(0)

We can use split

with(my_data, split(list_values,
     factor(list_names, levels = unique(list_names))))
$Ford
[1] "Ranger"   "F150"     "Explorer"

$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"

Or with unstack

unstack(my_data, list_values ~ list_names)
$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"    

$Ford
[1] "Ranger"   "F150"     "Explorer"

Vella answered 13/1, 2022 at 17:16 Comment(5)

my_data %>% unstack(list_values ~ list_names . Least verbose and most straightforward response. – Emotionalize 13/1, 2022 at 17:57

@JeffParker the split is compact as well, if you don't want to order the values with factor – Vella 13/1, 2022 at 17:58

Good point. In my case, order doesn't matter. Great reference for others. – Emotionalize 13/1, 2022 at 18:0

It does look like split is a little faster than unstack (just did a benchmark out of curiosity). – Harbinger 13/1, 2022 at 18:15

@AndrewGillreath-Brown split is a fast function whereas stack/unstack is slow – Vella 13/1, 2022 at 18:16

Here is another option (though a little more verbose) using group_modify and deframe from tidyverse.

library(tidyverse)

my_data |>
  group_by(list_names)  |>
  group_modify(\(x, ...) tibble(res = list(deframe(x)))) |>
  deframe()

Or another option could be to use summarise then again use deframe:

my_data %>%
  group_by(list_names) %>%
  summarise(named_vec = list(list_values)) %>%
  deframe()

Output

$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"    

$Ford
[1] "Ranger"   "F150"     "Explorer"

Benchmark

I was curious to see what is fastest of the answers here, and definitely looks like split by @akrun is by far the fastest, followed by unstack.

bm <- microbenchmark::microbenchmark(
  akrun_split = with(my_data, split(list_values,
                                    factor(list_names, levels = unique(list_names)))),
  akrun_unstack = unstack(my_data, list_values ~ list_names),
  andrew_deframe1 = my_data |>
    group_by(list_names)  |>
    group_modify(\(x, ...) tibble(res = list(deframe(x)))) |>
    deframe(),
  andrew_deframe2 = my_data %>%
    group_by(list_names) %>%
    summarise(named_vec = list(list_values)) %>%
    deframe(),
  paulsmith = my_data %>% 
    group_by(list_names) %>% 
    summarise(list_values = list(list_values)) %>% 
    {set_names(.$list_values, .$list_names)}, 
  times=1000L
)

Harbinger answered 13/1, 2022 at 17:24 Comment(1)

One nice thing about the last solution is that it is easy to encode a third column as the names of the values. Suppose we add a lets column as that third column. Then

my_data$lets <- head(letters);  my_data %>%   summarise(named_vec = list(setNames(list_values, lets)), .by = list_names) %>%   deframe

– Somali 24/9, 2023 at 22:58

We can use split

with(my_data, split(list_values,
     factor(list_names, levels = unique(list_names))))
$Ford
[1] "Ranger"   "F150"     "Explorer"

$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"

Or with unstack

unstack(my_data, list_values ~ list_names)
$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"    

$Ford
[1] "Ranger"   "F150"     "Explorer"

Vella answered 13/1, 2022 at 17:16 Comment(5)

my_data %>% unstack(list_values ~ list_names . Least verbose and most straightforward response. – Emotionalize 13/1, 2022 at 17:57

@JeffParker the split is compact as well, if you don't want to order the values with factor – Vella 13/1, 2022 at 17:58

Good point. In my case, order doesn't matter. Great reference for others. – Emotionalize 13/1, 2022 at 18:0

It does look like split is a little faster than unstack (just did a benchmark out of curiosity). – Harbinger 13/1, 2022 at 18:15

@AndrewGillreath-Brown split is a fast function whereas stack/unstack is slow – Vella 13/1, 2022 at 18:16

Another possible solution:

library(tidyverse)

my_data <- tibble(list_names = c("Ford", "Chevy", "Ford", "Dodge", "Dodge", "Ford"),
                  list_values = c("Ranger", "Equinox", "F150", "Caravan", "Ram", "Explorer"))

my_data %>% 
  group_by(list_names) %>% 
  summarise(list_values = list(list_values)) %>% 
  {set_names(.$list_values, .$list_names)}

#> $Chevy
#> [1] "Equinox"
#> 
#> $Dodge
#> [1] "Caravan" "Ram"    
#> 
#> $Ford
#> [1] "Ranger"   "F150"     "Explorer"

Ezaria answered 13/1, 2022 at 17:34 Comment(0)

Update II: Changed tibble output to vector outouput. Thanks to Jeff Parker!

Update due to Jeff Parker's comment (please see comments). I now updated the code. The issue was setting Names as unsorted, after using sort we can use setNames correctly. Then I added map to apply dplyrs select to remove first column in each dataframe:

library(dplyr)
library(purrr)

my_data %>% 
  group_by(list_names) %>% 
  mutate(list_values= paste(list_values, collapse = ", ")) %>% 
  slice(1) %>% 
  group_split() %>% 
  setNames(sort(unique(my_data$list_names))) %>% 
  map(., dplyr::pull, -list_names) %>%
  map(., ~str_split(.x, ", ")[[1]] )

$Chevy
[1] "Equinox"

$Dodge
[1] "Caravan" "Ram"    

$Ford
[1] "Ranger"   "F150"     "Explorer"

Supportive answered 13/1, 2022 at 17:26 Comment(4)

The list elements should be vectors, not dataframes. Also you are mixing Chevy with Dodge, and Dodge with Ford. – Emotionalize 13/1, 2022 at 17:54

@JeffParker. Thank you very much for your note. I will clear that. – Supportive 13/1, 2022 at 17:55

@JeffParker. Please see my update! – Supportive 13/1, 2022 at 18:51

The list elements are still tibbles, but see my latest answer for a fix using dplyr::pull and map(., str_split) – Emotionalize 13/1, 2022 at 21:19

Just for fun, added benchmarks with my for-loop and TarJae's answer.

bm <- microbenchmark::microbenchmark(
  akrun_split = with(my_data, split(list_values,
                                    factor(list_names, levels = unique(list_names)))),
  akrun_unstack = unstack(my_data, list_values ~ list_names),
  andrew_deframe1 = my_data |>
    group_by(list_names)  |>
    group_modify(\(x, ...) tibble(res = list(deframe(x)))) |>
    deframe(),
  andrew_deframe2 = my_data %>%
    group_by(list_names) %>%
    summarise(named_vec = list(list_values)) %>%
    deframe(),
  paulsmith = my_data %>% 
    group_by(list_names) %>% 
    summarise(list_values = list(list_values)) %>% 
    {set_names(.$list_values, .$list_names)},
  jeffs = {
    desired_output <- list()
    for(i in seq_along(my_data$list_names)) {
      entry <- my_data %>%
        filter(list_names == my_data$list_names[i]) %>%
        pull(list_values)
      desired_output[[my_data$list_names[i]]] <- entry
    }
    desired_output},
    TarJae = my_data %>% 
    group_by(list_names) %>% 
    mutate(list_values= paste(list_values, collapse = ", ")) %>% 
    slice(1) %>% 
    group_split() %>% 
    setNames(sort(unique(my_data$list_names))) %>% 
    map(., dplyr::pull, -list_names) %>%
    map(., ~str_split(.x, ", ")[[1]] ), 
  times=100L
)

I also ran benchmarks with a larger data set on the two fastest options (from Akrun).

library(nycflights13)
my_data <- nycflights13::flights %>%
  select(list_names = carrier, list_values = flight)

Emotionalize answered 13/1, 2022 at 21:18 Comment(0)

Another way of doing it:

library(tidyverse)

my_data %>%
  group_split(list_names) %>%
  map(~ lst(!!unique(pull(.x, list_names)) := unique(pull(.x, list_values)))) %>%
  flatten()
#> $Chevy
#> [1] "Equinox"
#> 
#> $Dodge
#> [1] "Caravan" "Ram"    
#> 
#> $Ford
#> [1] "Ranger"   "F150"     "Explorer"

^{Created on 2022-01-14 by the reprex package (v2.0.1)}

Data:

my_data <- tibble(
  list_names = c("Ford", "Chevy", "Ford", "Dodge", "Dodge", "Ford"),
  list_values = c("Ranger", "Equinox", "F150", "Caravan", "Ram", "Explorer")
)

Logography answered 14/1, 2022 at 4:47 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags