Why does min/max/sum(c(NA, 4, 5), na.rm = "xyz") work while mean() with same inputs doesn't?
Asked Answered
A

1

9

I would like to understand why sum/min/max functions in R interpret a character string as TRUE when supplied to na.rm, while mean() does not.

My uneducated guess is that as.logical("xyz") returns NA, which is being supplied to na.rm as the argument, which for some strange reason is accepted as TRUE for sum/min/max while it isn't for mean()

The expected output for sum(c(NA, 4, 5), na.rm = "xyz") is an argument is not interpretable as logical error (returned from a mean). I don't understand why that isn't the case.

Argus answered 21/5, 2019 at 23:52 Comment(5)
It is not a coincidence that min/max/sum are primitives while mean is not. The processing of if (na.rm) produces an error in mean.default, and I assume it does not in min/max/sum due to their being primitives.Respective
This QA is very similar, and points in the right direction of examining the C source code: stackoverflow.com/a/14035586Backcross
e.g. github.com/wch/r-source/blob/…Backcross
I agree that it would useful if na.rm would be evaluated & coerced consistently across the board. Note that na.rm="FALSE" is indeed parsed as a logical, so it's not that any string becomes TRUE, cf. sum(c(1:3,NA), na.rm="xyz") == 6, sum(c(1:3,NA), na.rm="TRUE") == 6, and sum(c(1:3,NA), na.rm="FALSE") == NA.Fluidics
Agreed! I don't understand the need for inconsistency here. I am not familiar with C but I would assume some of sort of strict type check should be simple to implement and would enforce consistent behavior across the board. Was definitely a WAT!? moment for me.Argus
E
2

As far as mean is concerned it is quite straightforward. As @Rich Scriven mentions if you type mean.default in the console you see a section of code

if (na.rm) 
   x <- x[!is.na(x)]

which gives you the error.

mean(1:10, na.rm = "abc") #gives

Error in if (na.rm) x <- x[!is.na(x)] : argument is not interpretable as logical

which is similar to doing

if ("abc") "Hello"

Error in if ("abc") "Hello" : argument is not interpretable as logical


Now regarding sum, min, max and other primitive functions which is implemented in C. The source code of these functions is here. There is a parameter Rboolean narm passed into the function.

The way C treats boolean is different.

#include <stdio.h>
#include <stdbool.h>

int main()
{
  bool a = "abc";
  if (a)
    printf("Hello World");
  else
    printf("Not Hello World");
  return 0;
}

If you run the above C code it will print "Hello World". Run the demo here. If you pass a string input to boolean type it is considered as TRUE in C. In fact that is even true with numbers as well

sum(1:10, na.rm = 12)

works as well.

PS - I am no expert in C and know a little bit of R. Finding all these insights took lot of time. Let me know if I have misinterpreted something and provided any false information.

Esquivel answered 22/5, 2019 at 4:27 Comment(2)
Thanks! I guess character strings and numbers are considered truthy in C but it still perturbs me that the implementation is not consistent with R's rules. I wonder if there is a reason why these primitives haven't been refactored for consistency (with some sort of type check in C).Argus
@Puzhu I agree. It would have been much better if these functions showed consistent behavior irrespective of their underlying implementation.Esquivel

© 2022 - 2024 — McMap. All rights reserved.