Dplyr produces NaN while base R produces NA
Asked Answered
W

1

9

Consider the following toy data and computations:

library(dplyr)

df <-  tibble(x = 1)

stats::sd(df$x)

dplyr::summarise(df, sd_x = sd(x))

The first calculation results in NA whereas the second, when the calculation is included in the dplyr function summarise produces NaN. I would expect both calculations to generate the same result and I wonder why they differ?

Wolfsbane answered 14/12, 2017 at 13:2 Comment(7)
Possible duplicate of What is the difference between NaN and Inf, and NULL and NA in R?Whiggism
I can duplicate. dplyr version 0.7.4 - the latest version from CRAN.Representation
Same here. However, what do you need that for? If you ask is.na(), both return a TRUE.Turtle
Interesting. For me both result with NA: > stats::sd(df$x) [1] NA and > dplyr::summarise(df, sd_x = sd(x)) # A tibble: 1 x 1 sd_x <dbl> 1 NAGermanic
@Germanic What version of dplyr are you using?Representation
@JohnPaul dplyr version 0.7.4Germanic
I'm the OP and I'm also using dplyr version 0.7.4.Wolfsbane
T
6

It is calling a different function. I'm not clear what the function is, but it is not the stats one.

dplyr::summarise(df, sd_x = stats::sd(x))
# A tibble: 1 x 1
   sd_x
  <dbl>
1    NA

debugonce(sd) # debug to see when sd is called

Not called here:

dplyr::summarise(df, sd_x = sd(x))
# A tibble: 1 x 1
   sd_x
  <dbl>
1   NaN

But called here:

dplyr::summarise(df, sd_x = stats::sd(x))
debugging in: stats::sd(1)
debug: sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
    na.rm = na.rm))
...

Update

It appears that the sd within summarise gets calculated outside of R, hinted at in this header file: https://github.com/tidyverse/dplyr/blob/master/inst/include/dplyr/Result/Sd.h

A number of functions seem to be redefined by dplyr. Given that var gives the same result in both cases, I think the sd behaviour is a bug.

Transfix answered 14/12, 2017 at 14:36 Comment(2)
what is you R dplyr and R version? I am surprised that I cannot reproduce the bug...Germanic
I also get NA when I do dplyr::mutate(df, var_x = var(x)). I've accepted this answer based on the suggestion that the behavior is a bug.Wolfsbane

© 2022 - 2024 — McMap. All rights reserved.