I want to split a large dataframe into a list of dataframes according to the values in two columns. I then want to apply a common data transformation on all dataframes (lag transformation) in the resulting list. I'm aware of the split command but can only get it to work on one column of data at a time.
Split dataframe using two columns of data and apply common transformation on list of resulting dataframes
You need to put all the factors you want to split by in a list, eg:
split(mtcars,list(mtcars$cyl,mtcars$gear))
Then you can use lapply
on this to do what else you want to do.
If you want to avoid having zero row dataframes in the results, there is a drop
parameter whose default is the opposite of the drop parameter in the "[" function.
split(mtcars,list(mtcars$cyl,mtcars$gear), drop=TRUE)
how about this one:
library(plyr)
ddply(df, .(category1, category2), summarize, value1 = lag(value1), value2=lag(value2))
seems like an excelent job for plyr
package and ddply()
function. If there are still open questions please provide some sample data. Splitting should work on several columns as well:
df<- data.frame(value=rnorm(100), class1=factor(rep(c('a','b'), each=50)), class2=factor(rep(c('1','2'), 50)))
g <- c(factor(df$class1), factor(df$class2))
split(df$value, g)
Thanks for the answers! Figured out that I needed to put the split variables in a list and that took care of the "splitting" problem using two vars. Read up on the plyr package and it is indeed powerful. Cannot make it do what I want however. Tried this command:llply(1:length(List),function(i){temp<-List[[i]]$a;List[[i]]$b<-append(head(temp,-1),na,after=0)}) and expected to find a new variable 'b' in each dataframe contained in 'List'. The command prints out the result List[[i]]$b on the screen. What have I misunderstood? –
Driscoll
You can also do the following:
split(x = df, f = ~ var1 + var2...)
This way, you can also achieve the same split dataframe by many variables without using a list in the f parameter.
© 2022 - 2024 — McMap. All rights reserved.