When I need to apply multiple functions to multiple columns sequentially and aggregate by multiple columns and want the results to be bound into a data frame I usually use aggregate()
in the following manner:
# bogus functions
foo1 <- function(x){mean(x)*var(x)}
foo2 <- function(x){mean(x)/var(x)}
# for illustration purposes only
npk$block <- as.numeric(npk$block)
subdf <- aggregate(npk[,c("yield", "block")],
by = list(N = npk$N, P = npk$P),
FUN = function(x){c(col1 = foo1(x), col2 = foo2(x))})
Having the results in a nicely ordered data frame is achieved by using:
df <- do.call(data.frame, subdf)
Can I avoid the call to do.call()
by somehow using aggregate()
smarter in this scenario or shorten the whole process by using another base R
solution from the start?
subdf
I will also have a data frame. But it will be a data frame that contains matrices in some columns which I desperately want to avoid! – Rosadodata.table
(and it's one of the several reasons why that package is so popular). I don't think you can achieve your desired result in base R much more easily than the way you showed. – Benzcbind
,do.call(data.frame, subdf)
would be sufficient. Another option would be to usesummarise_each
fromdplyr
. – Antediluvian