POSIXct object is NA, but is.na() returns FALSE
Asked Answered
A

2

12

I have encountered some very peculiar behaviour in R. I think it might even be a bug, but I'm asking here to check if someone is familiar with it or knows a solution.

What I'm trying to do is the following: I have a data frame with dates assigned to groups. I'm performing a for-loop over these groups, in which I calculate the maximum of the dates in this group. I want to skip the rest of the loop (next) if this maximum date is NA. However, this doesn't happen correctly.

Consider the following code:

library(dplyr)
library(lubridate)
a <- data.frame(group = c(1,1,1,1,1, 2,2,2,2, 3),
            ds = as_datetime(dmy('01-01-2018', NA, '03-01-2018', NA, '05-01-2018',
                                 '02-01-2018', '04-01-2018', '06-01-2018', '08-01-2018',
                                 NA)))

for (i in 1:3) {
  max_ds <- a %>% filter(group == i) %>% .$ds %>% max(na.rm = T)
  if (is.na(max_ds)) { next }
  print(max_ds)
}

The expected output is:

# [1] "2018-01-05 UTC"
# [1] "2018-01-08 UTC"

However, the obtained output is:

# [1] "2018-01-05 UTC"
# [1] "2018-01-08 UTC"
# [1] NA

The crux to this mystery seems to lie in the na.rm clause. If it is removed, the following happens:

for (i in 1:nr_groups) {
  max_ds <- a %>% filter(group == i) %>% .$ds %>% max()
  if (is.na(max_ds)) { next }
  print(max_ds)
}

# [1] "2018-01-08 UTC"

Which is exactly the expected result.

Any ideas?

Alyss answered 18/4, 2018 at 13:47 Comment(5)
Look at the output of max(NA, na.rm = TRUE).Lang
So you're saying that max_ds is equal to -Inf, which explains why is.na returns FALSE. However, why does it print as NA?Alyss
Because it is a datetime class: see max(as.POSIXct(NA), na.rm = TRUE) and as.POSIXct(-Inf, origin = "1900-01-01").Lang
That explains it then. Strange how this evaluates to NA, prints as NA, but is.na returns FALSE. However, is.na(as_date(NA)) returns TRUE.Alyss
It does not evaluate to NA.Lang
L
10

The issue is that you pass NA together with na.rm = TRUE. Then this happens:

max(NA, na.rm = TRUE)
#[1] -Inf
#Warning message:
#In max(NA, na.rm = TRUE) : no non-missing arguments to max; returning -Inf

The result is obviously not NA. If you pass a datetime variable, the result is still not NA, but printed as NA:

max(as.POSIXct(NA), na.rm = TRUE)
#[1] NA
#Warning message:
#In max.default(NA_real_, na.rm = TRUE) :
#  no non-missing arguments to max; returning -Inf
as.POSIXct(-Inf, origin = "1900-01-01")
#[1] NA
unclass(as.POSIXct(-Inf, origin = "1900-01-01"))
#[1] -Inf
#attr(,"tzone")
#[1] ""

You might want to test with is.finite:

!is.finite(max(as.POSIXct(NA), na.rm = TRUE))
#[1] TRUE
#Warning message:
#In max.default(NA_real_, na.rm = TRUE) :
#  no non-missing arguments to max; returning -Inf
Lang answered 18/4, 2018 at 14:9 Comment(0)
S
1

This is actually a bug of sorts. R is not performing a rational or intuitive check in the max algorithm and is therefore returning a confusing warning. IMO, max should recognize an empty vector input as such. Whether it returns NA or FALSE or something else reasonable could be up for debate. But to give a warning for "no non-missing arguments" and return -Inf is not very helpful or useful.

library(tidyverse)
c(1) %>% max()
#> [1] 1
c(1) %>% max(na.rm=T)
#> [1] 1
c(NA) %>% max()
#> [1] NA
c(NA) %>% max(na.rm=T)
#> Warning in max(., na.rm = T): no non-missing arguments to max; returning -Inf
#> [1] -Inf
c() %>% max()
#> Warning in max(.): no non-missing arguments to max; returning -Inf
#> [1] -Inf
c() %>% max(na.rm=T)
#> Warning in max(., na.rm = T): no non-missing arguments to max; returning -Inf
#> [1] -Inf

In the OP's example, group 3 only has a single element, and the value of that element is NA. This is the problem, not datetimes.

df = tibble(
  group = c(1,1,2,2,3,3),
  ds = c("2001-02-01", "2001-01-02", "2001-01-03", "2001-01-04", NA, "2001-01-05")
)
df %>% 
  group_by(group) %>% 
  summarize(max_date = max(ds, na.rm=T))
#> # A tibble: 3 x 2
#>   group max_date  
#>   <dbl> <chr>     
#> 1     1 2001-02-01
#> 2     2 2001-01-04
#> 3     3 2001-01-05
df %>% 
  head(-1) %>% 
  group_by(group) %>% 
  summarize(max_date = max(ds, na.rm=T))
#> Warning in max(ds, na.rm = T): no non-missing arguments, returning NA
#> # A tibble: 3 x 2
#>   group max_date  
#>   <dbl> <chr>     
#> 1     1 2001-02-01
#> 2     2 2001-01-04
#> 3     3 <NA>

However, If the max function returned NA when the vector is empty, we would see:

library(tidyverse)
my_max = function(x, na.rm=F)
{
  if ( length(x) == 0 | length(x[!is.na(x)]) == 0)
  {
    return(NA)
  } else
  {
    return(max(x, na.rm=na.rm))
  }
}
c(1) %>% my_max()
#> [1] 1
c() %>% my_max()
#> [1] NA
c(1,NA) %>% my_max()
#> [1] NA
c(1,NA) %>% my_max(na.rm=T)
#> [1] 1
c(NA) %>% my_max()
#> [1] NA
c(NA) %>% my_max(na.rm=T)
#> [1] NA

df = tibble(
  group = c(1,1,2,2,3),
  value = c(rnorm(4), NA)
)
df %>% 
  group_by(group) %>%
  summarize(Max = my_max(value))
#> # A tibble: 3 x 2
#>   group   Max
#>   <dbl> <dbl>
#> 1     1  1.44
#> 2     2  1.81
#> 3     3 NA
df %>% 
  group_by(group) %>%
  summarize(Max = my_max(value)) %>% 
  filter(!is.na(Max))
#> # A tibble: 2 x 2
#>   group   Max
#>   <dbl> <dbl>
#> 1     1  1.44
#> 2     2  1.81

More

Additionally, what max claims to return and what is displayed, and what is finally returned by is.na is completely different for date types.

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(tidyverse)

### Vector sith single NA value
### With and without lubridate::as_datetime

c(NA) %>%
  max()
#> [1] NA

c(NA) %>%
  as_datetime() %>%
  max()
#> [1] NA
### Returns NA which makes sense

c(NA) %>%
  max() %>%
  is.na()
#> [1] TRUE
### Returns expected result

c(NA) %>%
  as_datetime() %>%
  max() %>%
  is.na()
#> [1] TRUE
### The NA displayed is really NA

c(NA) %>%
  max(na.rm=T)
#> Warning in max(., na.rm = T): no non-missing arguments to max; returning -Inf
#> [1] -Inf
### Claims to return -Inf and also displays -Inf

c(NA) %>%
  as_datetime() %>%
  max(na.rm=T)
#> Warning in max.default(structure(NA_real_, class = c("POSIXct", "POSIXt": no
#> non-missing arguments to max; returning -Inf
#> [1] NA
### Claims to return -Inf BUT displays NA!!!
### ^^^^^ this is the offender ^^^^^ ###

c(NA) %>%
  max(na.rm=T) %>%
  is.na()
#> Warning in max(., na.rm = T): no non-missing arguments to max; returning -Inf
#> [1] FALSE
### Returns expected result for -Inf

c(NA) %>%
  as_datetime() %>%
  max(na.rm=T) %>%
  is.na()
#> Warning in max.default(structure(NA_real_, class = c("POSIXct", "POSIXt": no
#> non-missing arguments to max; returning -Inf
#> [1] FALSE
### ### Returns expected result for -Inf

Created on 2022-12-06 with reprex v2.0.2

Somersomers answered 6/12, 2022 at 8:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.