How to pass na.rm as argument to tapply?

Asked 5/1, 2013 at 14:10 Answered 30/10, 2014 at 3:30

I´d like to calculate mean and sd from a dataframe with one column for the parameter and one column for a group identifier. How can I calculate them when using tapply? I could use sd(v1, group, na.rm=TRUE), but can´t fit the na.rm=TRUE into the statement when using tapply. omit.na is no option. I have a whole bunch of parameters and have to go through them step by step without losing half of the dataframe when excluding all lines with one missing value.

data("weightgain", package = "HSAUR")
tapply(weightgain$weightgain, list(weightgain$source, weightgain$type), mean)

The same holds true for the by statement.

x<-c(1,2,3,4,5,6,7,8,9,NA)
y<-c(2,3,NA,3,4,NA,2,3,NA,2)
group<-rep((factor(LETTERS[1:2])),5)
df<-data.frame(x,y,group)
df

by(df$x,df$group,summary)
by(df$x,df$group,mean)

sd(df$x) #result: NA
sd(df$x, na.rm=TRUE) #result: 2.738613

Any ideas how to get this done?

Blus answered 5/1, 2013 at 14:10 Comment(2)

Pretty much! Can I apply that to several comlumns of the table or will I have to loop through a parameter list? tapply(df[c("x","y")], df$group, sd, na.rm=TRUE) or so? – Blus 5/1, 2013 at 14:35

The question is not making sense. With help(tapply) you should see that there is a ... argument that is described as offering a promise that named items will be passed to the FUN function. What error did you get when you used the code: tapply(df$V1, df$group, sd, na.rm=TRUE) – Boone 5/1, 2013 at 20:26

I think this should do what you want.

Select the columns you want:

v = c("x", "y")#or
v = colnames(df)[1:2]

Use sapply to iterate over v and pass the values to tapply:

sapply(v, function(i) tapply(df[[i]], df$group, sd, na.rm=TRUE))

Bloodworth answered 5/1, 2013 at 14:43 Comment(0)

Simply set na.rm=TRUE in the tapply function:

tapply(weightgain$weightgain, list(weightgain$source, weightgain$type), mean, na.rm=TRUE)

Grosswardein answered 30/10, 2014 at 3:30 Comment(1)

I agree. The accepted answer seems more convoluted, and this one worked like a charm. – Conductor 3/12, 2015 at 21:8

I think this should do what you want.

Select the columns you want:

v = c("x", "y")#or
v = colnames(df)[1:2]

Use sapply to iterate over v and pass the values to tapply:

sapply(v, function(i) tapply(df[[i]], df$group, sd, na.rm=TRUE))

Bloodworth answered 5/1, 2013 at 14:43 Comment(0)

Recommended topics

Hot tags