How to group a vector into a list of vectors?
Asked Answered
R

5

9

I have some data which looks like this (fake data for example's sake):

dressId        color 
6              yellow 
9              red
10             green 
10             purple 
10             yellow 
12             purple 
12             red 

where color is a factor vector. It is not guaranteed that all possible levels of the factor actually appear in the data (e.g. the color "blue" could also be one of the levels).

I need a list of vectors which groups the available colors of each dress:

[[1]]
yellow  

[[2]] 
red    

[[3]] 
green purple yellow 

[[4]] 
purple red 

Preserving the IDs of the dresses would be nice (e.g. a dataframe where this list is the second column and the IDs are the first), but not necessary.

I wrote a loop which goes through the dataframe row for row, and while the next ID is the same, it adds the color to a vector. (I am sure that the data is sorted by ID). When the ID in the first column changes, it adds the vector to a list:

result <- NULL 
while(blah blah) 
{
    some code which creates the vector called "colors" 
    result[[dressCounter]] <- colors 
    dressCounter <- dressCounter + 1
}

After wrestling with getting all the necessary counting variables correct, I found out to my dismay that it doesn't work. The first time, colors is

[1] yellow
Levels: green yellow purple red blue

and it gets coerced into an integer, so result becomes 2.

In the second loop repetition, colors only contains red, and result becomes a simple integer vector, [1] 2 4.

In the third repetition, colors is a vector now,

[1] green  purple yellow
Levels: green yellow purple red blue 

and I get

result[[3]] <- colors

Error in result[[3]] <- colors :
more elements supplied than there are to replace

What am I doing wrong? Is there a way to initialize result so it doesn't get converted into a numeric vector, but becomes a list of vectors?

Also, is there another way to do the whole thing than "roll my own"?

Remmer answered 1/2, 2014 at 15:5 Comment(0)
R
10

split.data.frame is a good way to organize this; then extract the color component.

d <- data.frame(dressId=c(6,9,10,10,10,12,12),
               color=factor(c("yellow","red","green",
                              "purple","yellow",
                              "purple","red"),
                 levels=c("red","orange","yellow",
                          "green","blue","purple")))

I think the version you want is actually this:

ss <- split.data.frame(d,d$dressId)

You can get something more like the list you requested by extracting the color component:

lapply(ss,"[[","color")
Rinaldo answered 1/2, 2014 at 15:13 Comment(2)
+1 If it is just the list they want (unclear from the description) perhaps it is better to do that with split directly and skip the lapply step.Staphyloplasty
From the description, "I need a list of vectors which groups the available colors," maybe split(d$color, d$dressId) or split(as.character(d$color), d$dressId) would suffice.Stanfill
S
7

In addition to split, you should consider aggregate. Use c or I as the aggregation function to get your list column:

out <- aggregate(color ~ dressId, mydf, c)
out
#   dressId                 color
# 1       6                yellow
# 2       9                   red
# 3      10 green, purple, yellow
# 4      12           purple, red
str(out)
# 'data.frame': 4 obs. of  2 variables:
#  $ dressId: int  6 9 10 12
#  $ color  :List of 4
#   ..$ 0: chr "yellow"
#   ..$ 1: chr "red"
#   ..$ 2: chr  "green" "purple" "yellow"
#   ..$ 3: chr  "purple" "red"
out$color
# $`0`
# [1] "yellow"
# 
# $`1`
# [1] "red"
# 
# $`2`
# [1] "green"  "purple" "yellow"
# 
# $`3`
# [1] "purple" "red" 

Note: This works even if the "color" variable is a factor, as in Ben's sample data (I missed that point when I posted the answer above) but you need to use I as the aggregation function instead of c:

out <- aggregate(color ~ dressId, d, I)
str(out)
# 'data.frame': 4 obs. of  2 variables:
#  $ dressId: num  6 9 10 12
#  $ color  :List of 4
#   ..$ 0: Factor w/ 6 levels "red","orange",..: 3
#   ..$ 1: Factor w/ 6 levels "red","orange",..: 1
#   ..$ 2: Factor w/ 6 levels "red","orange",..: 4 6 3
#   ..$ 3: Factor w/ 6 levels "red","orange",..: 6 1
out$color
# $`0`
# [1] yellow
# Levels: red orange yellow green blue purple
# 
# $`1`
# [1] red
# Levels: red orange yellow green blue purple
# 
# $`2`
# [1] green  purple yellow
# Levels: red orange yellow green blue purple
# 
# $`3`
# [1] purple red   
# Levels: red orange yellow green blue purple

Strangely, however, the default display shows the integer values:

out
#   dressId   color
# 1       6       3
# 2       9       1
# 3      10 4, 6, 3
# 4      12    6, 1
Staphyloplasty answered 1/2, 2014 at 16:13 Comment(1)
How to get strings instead of integer values?Exhume
P
5

Assuming your data frame is saved in a variable called df, then you can use simply group_by and summarize with list function of dplyr package like this

library('dplyr')

df %>%
  group_by(dressId) %>%
  summarize(colors = list(color))

Applied to your example:

df <- tribble(
  ~dressId, ~color,
         6, 'yellow',
         9, 'red',
        10, 'green',
        10, 'purple',
        10, 'yellow',
        12, 'purple',
        12, 'red'
)

df %>%
  group_by(dressId) %>%
  summarize(colors = list(color))

# dressId                colors
#       6                yellow
#       9                   red
#      10 green, purple, yellow
#      12           purple, red
Piotr answered 9/8, 2018 at 20:51 Comment(0)
G
0

I am afraid that the answer should be a little different, you should use the following code to accomplish the requested behaviour

df %>%
group_by(dressId) %>%
summarize(colors = toString(unique(color)))
Grith answered 9/6, 2020 at 13:4 Comment(1)
That will create a string, the question is asking for a list.Heavyset
M
0

All the other answers do the job and I'm slightly late to the party, but some have used dplyr, and I always try to stay away from tidyverse if possible, and for this problem one can use the base R without tidyverse bloat. Some others have solved this through making a dataframe and that is not what the title says :)

let's create the vectors as OP didn't provide us the code (note that OP wants vector and not a dataframe although you can do this with a dataframe with a minor modification):

dressId <- c(6, 9, 10, 10, 10, 12, 12)
color <- c("yellow", "red", "green", "purple", "yellow", "purple", "red")

Now let's get to the business and calculate what OP asked for:

I need a list of vectors which groups the available colors of each dress:

result <- split(x = color, f = dressId)

result

which will output:

$`6`
[1] "yellow"

$`9`
[1] "red"

$`10`
[1] "green"  "purple" "yellow"

$`12`
[1] "purple" "red" 

This is very simple and straight forward. Now, if you have more than one pair, for instance if you have another "red" for the dressID of 12, then you can pass the result of split() to unique():

result <- lapply(result, unique)

If you have the color as a factor, technically it should also work but it will make every item of the result a factor. to mitigate that, simply use unfactor() from varhandle package to convert your factor to a non-factor vector.

Marauding answered 10/1, 2022 at 20:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.