Applying multiple functions to each column in a data frame using aggregate

Asked 29/10, 2014 at 7:12 Answered 24/7, 2019 at 19:3

When I need to apply multiple functions to multiple columns sequentially and aggregate by multiple columns and want the results to be bound into a data frame I usually use aggregate() in the following manner:

# bogus functions
foo1 <- function(x){mean(x)*var(x)}
foo2 <- function(x){mean(x)/var(x)}

# for illustration purposes only
npk$block <- as.numeric(npk$block) 

subdf <- aggregate(npk[,c("yield", "block")],
                   by = list(N = npk$N, P = npk$P),
                   FUN = function(x){c(col1 = foo1(x), col2 = foo2(x))})

Having the results in a nicely ordered data frame is achieved by using:

df <- do.call(data.frame, subdf)

Can I avoid the call to do.call() by somehow using aggregate() smarter in this scenario or shorten the whole process by using another base R solution from the start?

Rosado answered 29/10, 2014 at 7:12 Comment(4)

Note that in subdf I will also have a data frame. But it will be a data frame that contains matrices in some columns which I desperately want to avoid! – Rosado 29/10, 2014 at 7:15

This task is pretty easy using data.table (and it's one of the several reasons why that package is so popular). I don't think you can achieve your desired result in base R much more easily than the way you showed. – Benz 29/10, 2014 at 7:17

@Rosado I guess you dont need cbind, do.call(data.frame, subdf) would be sufficient. Another option would be to use summarise_each from dplyr. – Antediluvian 29/10, 2014 at 7:20

@akrun, cheers. I will modify the question accordingly. – Rosado 29/10, 2014 at 10:7

As @akrun suggested, dplyr's summarise_each is well-suited to the task.

library(dplyr)
npk %>% 
  group_by(N, P) %>%
  summarise_each(funs(foo1, foo2), yield, block)

# Source: local data frame [4 x 6]
# Groups: N
# 
#   N P yield_foo2 block_foo2 yield_foo1 block_foo1
# 1 0 0   2.432390          1   1099.583      12.25
# 2 0 1   1.245831          1   2205.361      12.25
# 3 1 0   1.399998          1   2504.727      12.25
# 4 1 1   2.172399          1   1451.309      12.25

Cathee answered 29/10, 2014 at 9:46 Comment(1)

As the question if there is an easier base R solution seems to be "no" I will accept your answer. – Rosado 29/10, 2014 at 10:9

You can use

df=data.frame(as.list(aggregate(...

Ezraezri answered 24/7, 2019 at 19:3 Comment(0)

Recommended topics

Hot tags