Mean by factor by level

F

5

25

Maybe this is simple but I can't find answer on web. I have problem with mean calculation by factors by level. My data looks typicaly:

factor, value
a,1
a,2
b,1
b,1
b,1
c,1

I want to get vector A contains mean only for level "a" If I type A on consol I want to get 1.5 And this method for calculating mean, must use factors.

Thank you in advance for help.

Familiarity answered 30/4, 2014 at 18:23 Comment(3)

Try aggregate(value~factor, FUN=mean) – Chau 30/4, 2014 at 18:31

Or A <- mean(data$value[data$factor == "a"]) – Endsley 30/4, 2014 at 18:32

@Bartek. If you're going to go through the work of traversing the data frame to find which elements are factor=="a" you might as well perform the operation on the whole dataframe and take advantage of the other means later if needed. – Nonchalant 30/4, 2014 at 18:53

G

6

Just for fun posting the data.table solution although you probably should do what @lukeA suggested

library(data.table) 
A <- setDT(df)[factor == "a", mean(value)]
## [1] 1.5

Garamond answered 30/4, 2014 at 18:57 Comment(4)

What a truly bizarre programming language R is. – Lani 6/11, 2018 at 1:34

@Lani This is very silly way to do something very simple. I've posted this back when I just joined and was very rep hungry. If I could, I would delete this is answer all together. BTW, the solutions in the comments also look bizarre to you? Can you find something less bizzare than aggregate(value~factor, FUN=mean) in Python (not to mention Pandas copied everything from R). – Garamond 6/11, 2018 at 6:15

amen. Python doesn't have anything quite so cute as the aggregate function (which is pretty legible), but on the whole I find Python to be more expressive and easier to read. I find R is generally full of extremely terse statements, which while more compact than Python's syntax, are less easy to read off the page (at least for non-diehards). Reading a function in Python, one immediately sees how to translate it into any number of languages, but not so for R. That said, maybe I just need to drink the koolaid... – Lani 6/11, 2018 at 13:5

@Lani have you heard of the dplyr (or tidyverse) package in R? There is nothing more expressive than that in any language I believe. regarding Python, There are so many confusing stuff there like all these list comprehension shortcuts, numpy has the np.reshape(-1,... trick. You can exhaust the gorpuby in an iterator and so on. But I guess, this debate won't lead anywhere :) – Garamond 6/11, 2018 at 14:11

N

34

take a look at tapply, which lets you break up a vector according to a factor(s) and apply a function to each subset

> dat<-data.frame(factor=sample(c("a","b","c"), 10, T), value=rnorm(10))
> r1<-with(dat, tapply(value, factor, mean))
> r1
         a          b          c
 0.3877001 -0.4079463 -1.0837449
> r1[["a"]]
[1] 0.3877001

You can access your results using r1[["a"]] etc.

Alternatively, one of the popular R packages (plyr) has very nice ways of doing this.

> library(plyr)
> r2<-ddply(dat, .(factor), summarize, mean=mean(value))
> r2
  factor       mean
1      a  0.3877001
2      b -0.4079463
3      c -1.0837449
> subset(r2,factor=="a",select="mean")
       mean
1 0.3877001

You can also use dlply instead (which takes a dataframe and returns a list instead)

> dlply(dat, .(factor), summarize, mean=mean(value))$a
       mean
1 0.3877001

Nonchalant answered 30/4, 2014 at 18:49 Comment(2)

Is it possible to use ddply with two factors? – Sik 15/1, 2020 at 9:30

@Sik indeed, you can just modify the ddply call to ddply(dat, .(factor, factor2), summarize, mean=mean(value)), and this generalizes to more columns you want to "group" by. Hope that helps – Nonchalant 15/1, 2020 at 17:44

H

7

The following code asks for the mean of value when factor = a:

mean(data$value[data$factor == "a"])

Hugohugon answered 30/4, 2014 at 20:33 Comment(1)

perfect! I was looking exactly for this! in how to select a determined factor – Booboo 25/6, 2019 at 22:44

T

7

Another simple possibilty would be the "by" function:

by(value, factor, mean)

You can get the mean of factor level "a" by:

factor_means <- by(value, factor, mean)
factor_means[attr(factor_means, "dimnames")$factor=="a"]

Touzle answered 13/3, 2017 at 14:10 Comment(1)

how do I use the levels of a factor instead of the factor itself? – Sik 10/1, 2020 at 10:46

G

6

Just for fun posting the data.table solution although you probably should do what @lukeA suggested

library(data.table) 
A <- setDT(df)[factor == "a", mean(value)]
## [1] 1.5

Garamond answered 30/4, 2014 at 18:57 Comment(4)

What a truly bizarre programming language R is. – Lani 6/11, 2018 at 1:34

@Lani This is very silly way to do something very simple. I've posted this back when I just joined and was very rep hungry. If I could, I would delete this is answer all together. BTW, the solutions in the comments also look bizarre to you? Can you find something less bizzare than aggregate(value~factor, FUN=mean) in Python (not to mention Pandas copied everything from R). – Garamond 6/11, 2018 at 6:15

amen. Python doesn't have anything quite so cute as the aggregate function (which is pretty legible), but on the whole I find Python to be more expressive and easier to read. I find R is generally full of extremely terse statements, which while more compact than Python's syntax, are less easy to read off the page (at least for non-diehards). Reading a function in Python, one immediately sees how to translate it into any number of languages, but not so for R. That said, maybe I just need to drink the koolaid... – Lani 6/11, 2018 at 13:5

@Lani have you heard of the dplyr (or tidyverse) package in R? There is nothing more expressive than that in any language I believe. regarding Python, There are so many confusing stuff there like all these list comprehension shortcuts, numpy has the np.reshape(-1,... trick. You can exhaust the gorpuby in an iterator and so on. But I guess, this debate won't lead anywhere :) – Garamond 6/11, 2018 at 14:11

M

0

You can use ddply and pass summary as the function.

library(plyr) # import library
ddply(nameOfTheDataframe, ~ factor, function(data) summary(data$value))

Merle answered 28/2, 2022 at 12:5 Comment(0)

Recommended topics

Hot tags