I have some data which looks like this (fake data for example's sake):
dressId color
6 yellow
9 red
10 green
10 purple
10 yellow
12 purple
12 red
where color is a factor vector. It is not guaranteed that all possible levels of the factor actually appear in the data (e.g. the color "blue" could also be one of the levels).
I need a list of vectors which groups the available colors of each dress:
[[1]]
yellow
[[2]]
red
[[3]]
green purple yellow
[[4]]
purple red
Preserving the IDs of the dresses would be nice (e.g. a dataframe where this list is the second column and the IDs are the first), but not necessary.
I wrote a loop which goes through the dataframe row for row, and while the next ID is the same, it adds the color to a vector. (I am sure that the data is sorted by ID). When the ID in the first column changes, it adds the vector to a list:
result <- NULL
while(blah blah)
{
some code which creates the vector called "colors"
result[[dressCounter]] <- colors
dressCounter <- dressCounter + 1
}
After wrestling with getting all the necessary counting variables correct, I found out to my dismay that it doesn't work. The first time, colors
is
[1] yellow
Levels: green yellow purple red blue
and it gets coerced into an integer, so result
becomes 2
.
In the second loop repetition, colors
only contains red, and result
becomes a simple integer vector, [1] 2 4
.
In the third repetition, colors
is a vector now,
[1] green purple yellow
Levels: green yellow purple red blue
and I get
result[[3]] <- colors
Error in result[[3]] <- colors :
more elements supplied than there are to replace
What am I doing wrong? Is there a way to initialize result
so it doesn't get converted into a numeric vector, but becomes a list of vectors?
Also, is there another way to do the whole thing than "roll my own"?
split
directly and skip thelapply
step. – Staphyloplasty