Multiple functions in a single tapply or aggregate statement
Asked Answered
C

5

16

Is it possible to include two functions within a single tapply or aggregate statement?

Below I use two tapply statements and two aggregate statements: one for mean and one for SD.
I would prefer to combine the statements.

my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x)}))
with(my.Data, tapply(weight, list(age, sex), function(x) {sd(x)  }))

with(my.Data, aggregate(weight ~ age + sex, FUN = mean)
with(my.Data, aggregate(weight ~ age + sex, FUN =   sd)

# this does not work:

with(my.Data, tapply(weight, list(age, sex), function(x) {mean(x) ; sd(x)}))

# I would also prefer that the output be formatted something similar to that 
# show below.  `aggregate` formats the output perfectly.  I just cannot figure 
# out how to implement two functions in one statement.

  age    sex   mean        sd
adult female   97.5  3.535534
adult   male     90        NA
young female   80.0        NA
young   male     75        NA

I can always run two separate statements and merge the output. I was just hoping there might be a slightly more convenient solution.

I found the answer below posted here: Apply multiple functions to column using tapply

f <- function(x) c(mean(x), sd(x))
do.call( rbind, with(my.Data, tapply(weight, list(age, sex), f)) )

However, neither the rows or columns are labeled.

     [,1]     [,2]
[1,] 97.5 3.535534
[2,] 80.0       NA
[3,] 90.0       NA
[4,] 75.0       NA

I would prefer a solution in base R. A solution from the plyr package was posted at the link above. If I can add the correct row and column headings to the above output, it would be perfect.

Cribbage answered 5/3, 2013 at 3:2 Comment(0)
G
19

But these should have:

with(my.Data, aggregate(weight, list(age, sex), function(x) { c(MEAN=mean(x), SD=sd(x) )}))

with(my.Data, tapply(weight, list(age, sex), function(x) { c(mean(x) , sd(x) )} ))
# Not a nice structure but the results are in there

with(my.Data, aggregate(weight ~ age + sex, FUN =  function(x) c( SD = sd(x), MN= mean(x) ) ) )
    age    sex weight.SD weight.MN
1 adult female  3.535534 97.500000
2 young female        NA 80.000000
3 adult   male        NA 90.000000
4 young   male        NA 75.

The principle to be adhered to is to have your function return "one thing" which could be either a vector or a list but cannot be the successive invocation of two function calls.

Gamo answered 5/3, 2013 at 3:9 Comment(5)
Thank you! The two aggregate statements work. The tapply statement does not appear to work, but I can use the aggregate approach.Cribbage
Well I think it "works", just not to give you something that prints nicely. Try with(my.Data, tapply(weight, list(age, sex), function(x) { c(mean(x) , sd(x) )} ))[1,1] and play around with the indices to see inside that matrix of lists.Gamo
I see. Thank you. And if I put the entire statement inside colnames() or rownames() then I get the labels.Cribbage
The result here is a matrix as a column in a data frame for the third column. Easily solved by wrapping the whole thing in a do.call(data.frame, ...). +1Zither
Or even do.call(rbind, ...) as is often effective with the results of by(...) operations.Gamo
D
10

If you'd like to use data.table, it has with and by built right into it:

library(data.table)
myDT <- data.table(my.Data, key="animal")


myDT[, c("mean", "sd") := list(mean(weight), sd(weight)), by=list(age, sex)]


myDT[, list(mean_Aggr=sum(mean(weight)), sd_Aggr=sum(sd(weight))), by=list(age, sex)]
     age    sex mean_Aggr   sd_Aggr
1: adult female     96.0  3.6055513
2: young   male     76.5  2.1213203
3: adult   male     91.0  1.4142136
4: young female     84.5  0.7071068

I used a slightly different data set so as to not have NA values for sd

Demetrademetre answered 5/3, 2013 at 3:26 Comment(0)
Z
7

In the spirit of sharing, if you are familiar with SQL, you might also consider the "sqldf" package. (Emphasis added because you do need to know, for instance, that mean is avg in order to get the results you want.)

sqldf("select age, sex, 
      avg(weight) `Wt.Mean`, 
      stdev(weight) `Wt.SD` 
      from `my.Data` 
      group by age, sex")
    age    sex Wt.Mean    Wt.SD
1 adult female    97.5 3.535534
2 adult   male    90.0 0.000000
3 young female    80.0 0.000000
4 young   male    75.0 0.000000
Zither answered 5/3, 2013 at 18:26 Comment(0)
H
5

Reshape lets you pass 2 functions; reshape2 does not.

library(reshape)
my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)
my.Data[,1]<- NULL
(a1<-  melt(my.Data, id=c("age", "sex"), measured=c("weight")))
(cast(a1, age + sex ~ variable, c(mean, sd), fill=NA))

#     age    sex weight_mean weight_sd
# 1 adult female        97.5  3.535534
# 2 adult   male        90.0        NA
# 3 young female        80.0        NA
# 4 young   male        75.0        NA

I owe this to @Ramnath, who noted this just yesterday.

Houghton answered 5/3, 2013 at 4:10 Comment(0)
E
1

The function aggregate_multiple_fun in the SSBtools package is a wrapper to aggregate that allows multiple functions and functions of several variables. In this case two possibilities are:

library(SSBtools)
my.Data = read.table(text = "
  animal    age     sex  weight
       1  adult  female     100
       2  young    male      75
       3  adult    male      90
       4  adult  female      95
       5  young  female      80
", sep = "", header = TRUE)


aggregate_multiple_fun(my.Data, my.Data[c("age", "sex")], 
                       vars = c(mean = "weight", sd = "weight"))

#     age    sex weight_mean weight_sd
# 1 adult female        97.5  3.535534
# 2 young female        80.0        NA
# 3 adult   male        90.0        NA
# 4 young   male        75.0        NA

aggregate_multiple_fun(my.Data, my.Data[c("age", "sex")], 
                       vars = "weight", fun = c("mean", "sd"))

#     age    sex mean       sd
# 1 adult female 97.5 3.535534
# 2 young female 80.0       NA
# 3 adult   male 90.0       NA
# 4 young   male 75.0       NA
Eldaelden answered 19/4, 2023 at 9:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.