My question is: I have a data frame with some factor variables. I now want to assign a new vector to this data frame, which creates an index for each subset of those factor variables.
data <-data.frame(fac1=factor(rep(1:2,5)), fac2=sample(letters[1:3],10,rep=T))
Gives me something like:
fac1 fac2
1 1 a
2 2 c
3 1 b
4 2 a
5 1 c
6 2 b
7 1 a
8 2 a
9 1 b
10 2 c
And what I want is a combination counter which counts the occurrence of each factor combination. Like this
fac1 fac2 counter
1 1 a 1
2 2 c 1
3 1 b 1
4 2 a 1
5 1 c 1
6 2 b 1
7 1 a 2
8 2 a 2
9 1 b 2
10 1 a 3
So far I thought about using tapply to get the counter over all factor-combinations, which works fine
counter <-tapply(data$fac1, list(data$fac1,data$fac2), function(x) 1:length(x))
But I do not know how I can assign the counter list (e.g. unlisted) to the combinations in the data-frame without using inefficient looping :)
tapply
to eitherddply
from plyr, or if your data is huge and performance is an issue,data.table
. – Skipjack