Mean by factor by level
Asked Answered
F

5

25

Maybe this is simple but I can't find answer on web. I have problem with mean calculation by factors by level. My data looks typicaly:

factor, value
a,1
a,2
b,1
b,1
b,1
c,1

I want to get vector A contains mean only for level "a" If I type A on consol I want to get 1.5 And this method for calculating mean, must use factors.

Thank you in advance for help.

Familiarity answered 30/4, 2014 at 18:23 Comment(3)
Try aggregate(value~factor, FUN=mean)Chau
Or A <- mean(data$value[data$factor == "a"])Endsley
@Bartek. If you're going to go through the work of traversing the data frame to find which elements are factor=="a" you might as well perform the operation on the whole dataframe and take advantage of the other means later if needed.Nonchalant
G
6

Just for fun posting the data.table solution although you probably should do what @lukeA suggested

library(data.table) 
A <- setDT(df)[factor == "a", mean(value)]
## [1] 1.5
Garamond answered 30/4, 2014 at 18:57 Comment(4)
What a truly bizarre programming language R is.Lani
@Lani This is very silly way to do something very simple. I've posted this back when I just joined and was very rep hungry. If I could, I would delete this is answer all together. BTW, the solutions in the comments also look bizarre to you? Can you find something less bizzare than aggregate(value~factor, FUN=mean) in Python (not to mention Pandas copied everything from R).Garamond
amen. Python doesn't have anything quite so cute as the aggregate function (which is pretty legible), but on the whole I find Python to be more expressive and easier to read. I find R is generally full of extremely terse statements, which while more compact than Python's syntax, are less easy to read off the page (at least for non-diehards). Reading a function in Python, one immediately sees how to translate it into any number of languages, but not so for R. That said, maybe I just need to drink the koolaid...Lani
@Lani have you heard of the dplyr (or tidyverse) package in R? There is nothing more expressive than that in any language I believe. regarding Python, There are so many confusing stuff there like all these list comprehension shortcuts, numpy has the np.reshape(-1,... trick. You can exhaust the gorpuby in an iterator and so on. But I guess, this debate won't lead anywhere :)Garamond
N
34

take a look at tapply, which lets you break up a vector according to a factor(s) and apply a function to each subset

> dat<-data.frame(factor=sample(c("a","b","c"), 10, T), value=rnorm(10))
> r1<-with(dat, tapply(value, factor, mean))
> r1
         a          b          c
 0.3877001 -0.4079463 -1.0837449
> r1[["a"]]
[1] 0.3877001

You can access your results using r1[["a"]] etc.

Alternatively, one of the popular R packages (plyr) has very nice ways of doing this.

> library(plyr)
> r2<-ddply(dat, .(factor), summarize, mean=mean(value))
> r2
  factor       mean
1      a  0.3877001
2      b -0.4079463
3      c -1.0837449
> subset(r2,factor=="a",select="mean")
       mean
1 0.3877001

You can also use dlply instead (which takes a dataframe and returns a list instead)

> dlply(dat, .(factor), summarize, mean=mean(value))$a
       mean
1 0.3877001
Nonchalant answered 30/4, 2014 at 18:49 Comment(2)
Is it possible to use ddply with two factors?Sik
@Sik indeed, you can just modify the ddply call to ddply(dat, .(factor, factor2), summarize, mean=mean(value)), and this generalizes to more columns you want to "group" by. Hope that helpsNonchalant
H
7

The following code asks for the mean of value when factor = a:

mean(data$value[data$factor == "a"])
Hugohugon answered 30/4, 2014 at 20:33 Comment(1)
perfect! I was looking exactly for this! in how to select a determined factorBooboo
T
7

Another simple possibilty would be the "by" function:

by(value, factor, mean)

You can get the mean of factor level "a" by:

factor_means <- by(value, factor, mean)
factor_means[attr(factor_means, "dimnames")$factor=="a"]
Touzle answered 13/3, 2017 at 14:10 Comment(1)
how do I use the levels of a factor instead of the factor itself?Sik
G
6

Just for fun posting the data.table solution although you probably should do what @lukeA suggested

library(data.table) 
A <- setDT(df)[factor == "a", mean(value)]
## [1] 1.5
Garamond answered 30/4, 2014 at 18:57 Comment(4)
What a truly bizarre programming language R is.Lani
@Lani This is very silly way to do something very simple. I've posted this back when I just joined and was very rep hungry. If I could, I would delete this is answer all together. BTW, the solutions in the comments also look bizarre to you? Can you find something less bizzare than aggregate(value~factor, FUN=mean) in Python (not to mention Pandas copied everything from R).Garamond
amen. Python doesn't have anything quite so cute as the aggregate function (which is pretty legible), but on the whole I find Python to be more expressive and easier to read. I find R is generally full of extremely terse statements, which while more compact than Python's syntax, are less easy to read off the page (at least for non-diehards). Reading a function in Python, one immediately sees how to translate it into any number of languages, but not so for R. That said, maybe I just need to drink the koolaid...Lani
@Lani have you heard of the dplyr (or tidyverse) package in R? There is nothing more expressive than that in any language I believe. regarding Python, There are so many confusing stuff there like all these list comprehension shortcuts, numpy has the np.reshape(-1,... trick. You can exhaust the gorpuby in an iterator and so on. But I guess, this debate won't lead anywhere :)Garamond
M
0

You can use ddply and pass summary as the function.

library(plyr) # import library
ddply(nameOfTheDataframe, ~ factor, function(data) summary(data$value))
Merle answered 28/2, 2022 at 12:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.