Why does mean(NA, na.rm = TRUE) return NaN
Asked Answered
R

2

8

When estimating the mean with a vector of all NA's we get an NaN if na.rm = TRUE. Why is this, is this flawed logic or is there something I'm missing? Surely it would make more sense to use NA than NaN?

Quick example below

mean(NA, na.rm = TRUE)
#[1] NaN

mean(rep(NA, 10), na.rm = TRUE)
#[1] NaN
Rime answered 24/7, 2018 at 16:50 Comment(3)
Because you then have a vector of length zero and a division by zero is NaN. As for whether it makes more sense, I believe it doesn't, since you have removed the missing values.Teacart
Because you have nothing left. mean(numeric(0))Jaborandi
Note that mean(as.Date(NA), na.rm = TRUE) is NA and not NaN thoughTiber
J
9

It is a bit pity that ?mean does not say anything about this. My comment only told you that applying mean on an empty "numeric" results in NaN without more reasoning. Rui Barradas's comment tried to reason this but was not accurate, as division by 0 is not always NaN, it can be Inf or -Inf. I once discussed about this in R: element-wise matrix division. However, we are getting close. Although mean(x) is not coded by sum(x) / length(x), this mathematical fact really explains this NaN.

From ?sum:

 *NB:* the sum of an empty set is zero, by definition.

So sum(numeric(0)) is 0. As length(numeric(0)) is 0, mean(numeric(0)) is 0 / 0 which is NaN.

Jaborandi answered 29/7, 2018 at 19:42 Comment(1)
Further reading on NaN: In R, why does is.numeric(NaN) print “TRUE”?. Also, readers should understand that median(numeric(0)) gives NA, min(numeric(0)) gives Inf, and max(numeric(0)) gives -Inf. Why they behave as such is explained in documentation page ?median and ?min.Jaborandi
R
2

From mean documentation :

na.rm a logical value indicating whether NA values should be stripped before the computation proceeds.

With this logic all NAs are removed before the function mean is applied. In your cases you are applying mean to nothing (all NAs are removed) so NaN is returned.

Reverberatory answered 24/7, 2018 at 16:54 Comment(1)
I believe it wont return NULL because r still recognizes the vector as numeric even though it contains all missing values. For example this will throw a warning even though you remove the NA value: mean(c(NA_character_),na.rm = TRUE). Interesting point on the numeric(0) though.Reverberatory

© 2022 - 2024 — McMap. All rights reserved.