ifelse() stripping POSIXct attribute from vector of timestamps?
Asked Answered
L

3

12

This is weird: R's ifelse() seems to do some (unwanted) casting: Lets say I have a vector of timestamps (possibly NA) and NA values should be treated differently than existing dates, for example, just ignored:

formatString = "%Y-%m-%d %H:%M:%OS"
timestamp = c(as.POSIXct(strptime("2000-01-01 12:00:00.000000", formatString)) + (1:3)*30, NA)

Now

timestamp
#[1] "2000-01-01 12:00:30 CET" "2000-01-01 12:01:00 CET" "2000-01-01 12:01:30 CET"
#[6] NA    

as desired but translation by 30 seconds results in

ifelse(is.na(timestamp), NA, timestamp+30)
#[1] 946724460 946724490 946724520        NA

Notice that still, timestamp+30 works as expected but lets say I want to replace NA dates by a fixed date and translate all the others by 30 secs:

fixedDate = as.POSIXct(strptime("2000-01-01 12:00:00.000000", formatString))
ifelse(is.na(timestamp), fixedDate, timestamp+30)
#[1] 946724460 946724490 946724520 946724400

Question: whats wrong with this solution and why doesn't it work as expected?

Edit: the desired output is a vector of timestamps (not of integers) translated by 30 secs and the NA's being replaced by whatever...

Levania answered 30/6, 2015 at 8:35 Comment(6)
What doesn't work as expected? I get NA replaced by fixedDate. Or I don't understand the problem.Crouse
I second @Pascal as.numeric(fixedDate) == ifelse(is.na(timestamp), fixedDate, timestamp+30)[4] return TRUE so not sure what's the issue reallyBattaglia
I suspect the question to ask is: What is your expected output ? This may help understand where you're stuckMascarenas
You should study the Value and Warning section of ?ifelse and the third set of examples ("ifelse() strips attributes"; "This is important when working with Dates"). A relevant post (first hit when googling "r ifelse as.POSIXct").Unitarian
I edited the title to make the question clearer than "Is ifelse broken?". You needed to say it was stripping the POSIXct attribute.Myrtie
I recommend checking out this solution via dplyr: #6669463Junia
A
8

If you look at the way ifelse is written, it has a section of code that looks like this:

ans <- test
ok <- !(nas <- is.na(test))
if (any(test[ok]))
  ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]

Note that the answer starts off as a logical vector, the same as test. The elements that have test == TRUE then get assigned to the value of yes.

The issue here then is with what happens with assignment of an element or elements of a logical vector to be a date of class POSIX.ct. You can see what happens if you do this:

x <- c(TRUE, FALSE)
class(x)
# logical
x[1] <- Sys.time()
class(x)
# numeric

You could get around this by writing:

timestamp <- timestamp + 30
timestamp[is.na(timestamp)] <- fixedDate

You could also do this:

fixedDate = as.POSIXct(strptime("2000-01-01 12:00:00.000000", formatString))
unlist(ifelse(is.na(timestamp), as.list(fixedDate), as.list(timestamp+30)))

This takes advantage of the way the replacement operator [<- handles a list on the right hand side.

You can also just re-add the class attribute like this:

x <- ifelse(is.na(timestamp), fixedDate, timestamp+30)
class(x) <- c("POSIXct", "POSIXt")

or if you were desperate to do it in one line like this:

`class<-`(ifelse(is.na(timestamp), fixedDate, timestamp+30), c("POSIXct", "POSIXt"))

or by copying the attributes of fixedDate:

x <- ifelse(is.na(timestamp), fixedDate, timestamp+30)
attributes(x) <- attributes(fixedDate)

This last version has the advantage of copying the tzone attribute as well.

As of dplyr 0.5.0, you can also use dplyr::if_else which preserves class in the output and also enforces the same class for the true and false arguments.

Although answered 30/6, 2015 at 8:56 Comment(12)
What about just reformating the dates as strings with strptime(ifelse(...),"%s") as it's coerced to a number of seconds.Mascarenas
@Mascarenas agree that would be more straightforward for dates in particular.Although
@Mascarenas even more straightforward would be to just add back the appropriate class.Although
do you mean in the ifelse code or at the return (the strptime give a little more control, you can give the timezone to get the values with correct DST etc.). I disagree setting it in the ifelse code, too much overhead for a too narrow scope of usage at endMascarenas
@NickK: Ah I see... this is interesting... R is then somehow the 'counterpart' of magma [which complains about type conversions all the time]... integer is the 'common' (?) overclass of logical and date...Levania
@Mascarenas I meant like this x <- ifelse(is.na(timestamp), fixedDate, timestamp + 30); class(x) <- c("POSIXct", "POSIXt")Although
@FabianWerner Nope, just that a POSIXct is stored internally as an integer (number of seconds from January 1st 1970), so when it has to loose it's type and all properties, what is returned is this integer part.Mascarenas
@NickK Yes, that's valid unless you have a source date with a specific timezone different from your machine timezone, that's why I tend to prefer strptime. And it could be done in the same statement, which is a cosmetic preference.Mascarenas
@Mascarenas Well, POSIXct != integer per se, because when you give R a POSIXct timestamp and print it out then you get, in fact, a timestamp.... so Im talking about the 'whole object' (including the information that this integer actually means something else). A string is also just a number, everything is just a number and yet, it is something completely different :-)Levania
@Mascarenas and what you call 'loose the type' I call 'convert it to common overclass'... so yes, we mean the sameLevania
@FabianWerner In my point of view, this is not an overclass as there's no inheritance, but yes, we're on the same line with a different point of view :)Mascarenas
This question was answered here with an effective dplyr/tidy solution: #6669463Junia
M
1

As Henrik remarked, ifelse() strips attributes, unlike a simple for-loop.

A workaround to filling NAs without grief is the simpler and clearer function zoo::na.fill

Then you would do: na.fill(timestamp, fixedDate)

See also na.locf, na.approx, na.spline ..., other excellent convenience functions from zoo.

Myrtie answered 30/6, 2015 at 17:47 Comment(4)
As remarked: a usual for-loop over the vector would return a sequence of timestampts, not integers.Levania
Updated to cover thatMyrtie
Oh and by the way: I dont see why ifelse does this. I dont think that it is perfectly fine... its just a pitfall. One could also fourier transform the output and would get something that 'could' theoretically be reinterpreted as timestamps but still: One does not do it like this. Why? Because it would be crap: If im putting in timestamps, I expect the output to be of the same type...Levania
I agree it's not ok, is a little-known pitfall and should be documented more prominently (if not also trigger a "Warning: Attributes dropped by ifelse ...."). I recommend you use zoo::na.fill() like I said. It's faster, simpler and clearer.Myrtie
W
0

threadrezzing for an easier tidyverse cough solution because dplyr::case_when retains POSIXct data.

library(dplyr)
library(lubridate)

formatString = "%Y-%m-%d %H:%M:%OS"
timestamp = tibble(a = c(as.POSIXct(strptime("2000-01-01 12:00:00.000000", formatString)) + (1:3)*30, NA))

timestamp %>% mutate(a = case_when(
  is.na(a) ~ NA_POSIXct_, 
  TRUE ~ a + 30)) %>% pull
Wheatear answered 4/7 at 16:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.