Split dataframe using two columns of data and apply common transformation on list of resulting dataframes
Asked Answered
D

3

47

I want to split a large dataframe into a list of dataframes according to the values in two columns. I then want to apply a common data transformation on all dataframes (lag transformation) in the resulting list. I'm aware of the split command but can only get it to work on one column of data at a time.

Driscoll answered 20/1, 2012 at 14:12 Comment(0)
N
77

You need to put all the factors you want to split by in a list, eg:

split(mtcars,list(mtcars$cyl,mtcars$gear))

Then you can use lapply on this to do what else you want to do.

If you want to avoid having zero row dataframes in the results, there is a drop parameter whose default is the opposite of the drop parameter in the "[" function.

split(mtcars,list(mtcars$cyl,mtcars$gear), drop=TRUE)
Natalya answered 20/1, 2012 at 14:46 Comment(0)
G
7

how about this one:

 library(plyr)
 ddply(df, .(category1, category2), summarize, value1 = lag(value1), value2=lag(value2))

seems like an excelent job for plyr package and ddply() function. If there are still open questions please provide some sample data. Splitting should work on several columns as well:

df<- data.frame(value=rnorm(100), class1=factor(rep(c('a','b'), each=50)), class2=factor(rep(c('1','2'), 50)))
g <- c(factor(df$class1), factor(df$class2))
split(df$value, g)
Grassofparnassus answered 20/1, 2012 at 14:44 Comment(1)
Thanks for the answers! Figured out that I needed to put the split variables in a list and that took care of the "splitting" problem using two vars. Read up on the plyr package and it is indeed powerful. Cannot make it do what I want however. Tried this command:llply(1:length(List),function(i){temp<-List[[i]]$a;List[[i]]$b<-append(head(temp,-1),na,after=0)}) and expected to find a new variable 'b' in each dataframe contained in 'List'. The command prints out the result List[[i]]$b on the screen. What have I misunderstood?Driscoll
S
3

You can also do the following:

split(x = df, f = ~ var1 + var2...)

This way, you can also achieve the same split dataframe by many variables without using a list in the f parameter.

Swinger answered 27/7, 2022 at 7:12 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.