How R formats POSIXct with fractional seconds
Asked Answered
E

2

69

I believe that R incorrectly formats POSIXct types with fractional seconds. I submitted this via R-bugs as an enhancement request and got brushed off with "we think the current behavior is correct -- bug deleted." While I am very appreciative of the work they have done and continue to do, I wanted to get other peoples' take on this particular issue, and perhaps advice on how to make the point more effectively.

Here is an example:

 > tt <- as.POSIXct('2011-10-11 07:49:36.3')
 > strftime(tt,'%Y-%m-%d %H:%M:%OS1')
 [1] "2011-10-11 07:49:36.2"

That is, tt is created as a POSIXct time with fractional part .3 seconds. When it is printed with one decimal digit, the value shown is .2. I work a lot with timestamps of millisecond precision and it causes me a lot of headaches that times are often printed one notch lower than the actual value.

Here is what is happening: POSIXct is a floating-point number of seconds since the epoch. All integer values are handled precisely, but in base-2 floating point, the closest value to .3 is very slightly smaller than .3. The stated behavior of strftime() for format %OSn is to round down to the requested number of decimal digits, so the displayed result is .2. For other fractional parts the floating point value is slightly above the value entered and the display gives the expected result:

 > tt <- as.POSIXct('2011-10-11 07:49:36.4')
 > strftime(tt,'%Y-%m-%d %H:%M:%OS1')
 [1] "2011-10-11 07:49:36.4"

The developers' argument is that for time types we should always round down to the requested precision. For example, if the time is 11:59:59.8 then printing it with format %H:%M should give "11:59" not "12:00", and %H:%M:%S should give "11:59:59" not "12:00:00". I agree with this for integer numbers of seconds and for format flag %S, but I think the behavior should be different for format flags that are designed for fractional parts of seconds. I would like to see %OSn use round-to-nearest behavior even for n = 0 while %S uses round-down, so that printing 11:59:59.8 with format %H:%M:%OS0 would give "12:00:00". This would not affect anything for integer numbers of seconds because those are always represented precisely, but it would more naturally handle round-off errors for fractional seconds.

This is how printing of fractional parts is handled in, for example C, because integer casting rounds down:

 double x = 9.97;
 printf("%d\n",(int) x);   //  9
 printf("%.0f\n",x);       //  10
 printf("%.1f\n",x);       //  10.0
 printf("%.2f\n",x);       //  9.97

I did a quick survey of how fractional seconds are handled in other languages and environments, and there really doens't seem to be a consensus. Most constructs are designed for integer numbers of seconds and the fractional parts are an afterthought. It seems to me that in this case the R developers made a choice that is not completely unreasonable but is in fact not the best one, and is not consistent with the conventions elsewhere for displaying floating-point numbers.

What are peoples' thoughts? Is the R behavior correct? Is it the way you yourself would design it?

Ernestineernesto answered 11/10, 2011 at 12:29 Comment(5)
Please re-phrase your questions. The first and last are subjective and make me want to vote to close, but the rest of the question is wonderful. Something like, "how do I get my desired behavior?" would remove my conflicting emotions. ;-)Chaeta
Well, one of the rules I learned early on in Physics is "always carry at least one more sigfig than you think you'll need." So: Rgames: tt <- as.POSIXct('2011-10-11 07:49:36.32842') leads to Rgames: strftime(tt,'%Y-%m-%d %H:%M:%OS4') [1] "2011-10-11 07:49:36.3284" Rgames: strftime(tt,'%Y-%m-%d %H:%M:%OS1') [1] "2011-10-11 07:49:36.3" I strongly recommend you simply carry a couple extra places and subsequently round off whatever way works best for you.Caphaitien
Thanks for the comments. I do in fact know how to get the desired output: add a small offset to each "true" number. Thus if my number is actually .836 then I could store a modified version as .8360001 in order to coerce "%OS3" to produce ".836" rather than ".835". But that is a crude workaround and should not be necessary -- it would not be necessary for ordinary floating-point numbers. Even in R, sprintf('%.0f',9.97) produces "10" not "9".Ernestineernesto
Robert, that is not a crude workaround in my book but a proper reflection of the underlying binary representation. Google for the famous paper "What every computer scientist should know about floating point".Windpollinated
Excuse me, I have taught numerical analysis and scientific computing for many years, and I am very familiar with Goldberg's article and its contents. Did you read my original post? I am arguing that string conversion of fractional seconds in R ought to work consistently with string conversion of floating point numbers. R is written by hard-working human beings like you and me, who make design decisions as best as they can, and I am arguing that in this case they have overlooked an important aspect.Ernestineernesto
A
37

One underlying problem is that the POSIXct representation is less precise than the POSIXlt representation, and the POSIXct representation gets converted to the POSIXlt representation before formatting. Below we see that if our string is converted directly to POSIXlt representation, it outputs correctly.

> as.POSIXct('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.2 CDT"
> as.POSIXlt('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.3"

We can also see that by looking at the difference between the binary representation of the two formats and the usual representation of 0.3.

> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> as.numeric(t1 - round(unclass(t1))) - 0.3
[1] -4.768372e-08

> t2 <- as.POSIXlt('2011-10-11 07:49:36.3')
> as.numeric(t2$sec - round(unclass(t2$sec))) - 0.3
[1] -2.831069e-15

Interestingly, it looks like both representations are actually less than the usual representation of 0.3, but that the second one is either close enough, or truncates in a way different than I'm imagining here. Given that, I'm not going to worry about floating point representation difficulties; they may still happen, but if we're careful about which representation we use, they will hopefully be minimized.

Robert's desire for rounded output is then simply an output problem, and could be addressed in any number of ways. My suggestion would be something like this:

myformat.POSIXct <- function(x, digits=0) {
  x2 <- round(unclass(x), digits)
  attributes(x2) <- attributes(x)
  x <- as.POSIXlt(x2)
  x$sec <- round(x$sec, digits)
  format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep=""))
}

This starts with a POSIXct input, and first rounds to the desired digits; it then converts to POSIXlt and rounds again. The first rounding makes sure that all of the units increase appropriately when we are on a minute/hour/day boundary; the second rounding rounds after converting to the more precise representation.

> options(digits.secs=1)
> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> format(t1)
[1] "2011-10-11 07:49:36.2"
> myformat.POSIXct(t1,1)
[1] "2011-10-11 07:49:36.3"

> t2 <- as.POSIXct('2011-10-11 23:59:59.999')
> format(t2)
[1] "2011-10-11 23:59:59.9"
> myformat.POSIXct(t2,0)
[1] "2011-10-12 00:00:00"
> myformat.POSIXct(t2,1)
[1] "2011-10-12 00:00:00.0"

A final aside: Did you know the standard allows for up to two leap seconds?

> as.POSIXlt('2011-10-11 23:59:60.9')
[1] "2011-10-11 23:59:60.9"

OK, one more thing. The behavior actually changed in May due to a bug filed by the OP (Bug 14579); before that it did round fractional seconds. Unfortunately that meant that sometimes it could round up to a second that wasn't possible; in the bug report, it went up to 60 when it should have rolled over to the next minute. One reason the decision was made to truncate instead of round is that it's printing from the POSIXlt representation, where each unit is stored separately. Thus rolling over to the next minute/hour/etc is more difficult than just a straightforward rounding operation. To round easily, it's necessary to round in POSIXct representation and then convert back, as I suggest.

Abolition answered 11/10, 2011 at 18:33 Comment(6)
Thanks very much for the thoughtful answer. I will poke around a bit and try to see why POSIXlt acts differently. Note that if the POSIXct value is converted to POSIXlt, rather than creating POSIXlt from the string, then the same problem appears.Ernestineernesto
True. I suspect either POSIXct is stored somewhere as a regular float (not a double) and the elements of POSIXlt are stored as doubles, or that there's just more room in the floating point number to express the decimal portion when the integer portion is small (in POSIXlt, it's <60) rather than large (in POSIXct, it's a big integer.Abolition
I think a better way to do conversion of POSIXct to string with "%OSn" is to do the fractional part of the seconds using "%.*f". Increment the integer number of seconds if that formatting gives "1". Then format the integer number of seconds using strftime(). I'll send you some sample code separately and maybe you can give me advice on how to get it into the right hands.Ernestineernesto
I believe that's what it used to do before the behavior changed in May, though technically it was printing the entire seconds value, both integer and fractional part, using %*.*f. The problem is that then the seconds can be rounded up to 60, when it should round up to zero and increment the minutes.Abolition
I think I found a counter example, as displayed on this post #15383557Bron
Yeah, so I guess you do need a little fudge factor, as rounding can end up a little less than the desired number.Abolition
C
20

I've run into this problem, and so started looking for a solution. @Aaron's answer is good, but still breaks for large dates.

Here is code that rounds the seconds properly, according to format or option("digits.secs"):

form <- function(x, format = "", tz= "", ...) {
  # From format.POSIXct
  if (!inherits(x, "POSIXct")) 
    stop("wrong class")
  if (missing(tz) && !is.null(tzone <- attr(x, "tzone"))) 
    tz <- tzone

  # Find the number of digits required based on the format string
  if (length(format) > 1)
    stop("length(format) > 1 not supported")

  m <- gregexpr("%OS[[:digit:]]?", format)[[1]]
  l <- attr(m, "match.length")
  if (l == 4) {
    d <- as.integer(substring(format, l+m-1, l+m-1))
  } else {
    d <- unlist(options("digits.secs"))
    if (is.null(d)) {
      d <- 0
    }
  }  


  secs.since.origin <- unclass(x)            # Seconds since origin
  secs <- round(secs.since.origin %% 60, d)  # Seconds within the minute
  mins <- floor(secs.since.origin / 60)      # Minutes since origin
  # Fix up overflow on seconds
  if (secs >= 60) {
    secs <- secs - 60
    mins <- mins + 1
  }

  # Represents the prior minute
  lt <- as.POSIXlt(60 * mins, tz=tz, origin=ISOdatetime(1970,1,1,0,0,0,tz="GMT"));
  lt$sec <- secs + 10^(-d-1)  # Add in the seconds, plus a fudge factor.
  format.POSIXlt(as.POSIXlt(lt), format, ...)
}

The fudge factor of 10^(-d-1) is from here: Accurately converting from character->POSIXct->character with sub millisecond datetimes by Aaron.

Some examples:

f  <- "%Y-%m-%d %H:%M:%OS"
f3 <- "%Y-%m-%d %H:%M:%OS3"
f6 <- "%Y-%m-%d %H:%M:%OS6"

From a nearly identical question:

x <- as.POSIXct("2012-12-14 15:42:04.577895")

> format(x, f6)
[1] "2012-12-14 15:42:04.577894"
> form(x, f6)
[1] "2012-12-14 15:42:04.577895"
> myformat.POSIXct(x, 6)
[1] "2012-12-14 15:42:04.577895"

From above:

> format(t1)
[1] "2011-10-11 07:49:36.2"
> myformat.POSIXct(t1,1)
[1] "2011-10-11 07:49:36.3"
> form(t1)
[1] "2011-10-11 07:49:36.3"

> format(t2)
[1] "2011-10-11 23:59:59.9"
> myformat.POSIXct(t2,0)
[1] "2011-10-12 00:00:00"
> myformat.POSIXct(t2,1)
[1] "2011-10-12 00:00:00.0"

> form(t2)
[1] "2011-10-12"
> form(t2, f)
[1] "2011-10-12 00:00:00.0"

The real fun comes in 2038 for some dates. I assume this is because we lose one more bit of precision in the mantissa. Note the value of the seconds field.

> t3 <- as.POSIXct('2038-12-14 15:42:04.577895')
> format(t3)
[1] "2038-12-14 15:42:05.5"
> myformat.POSIXct(t3, 1)
[1] "2038-12-14 15:42:05.6"
> form(t3)
[1] "2038-12-14 15:42:04.6"

This code seems to work for other edge cases that I've tried. The common thing between format.POSIXct and myformat.POSIXct in Aaron's answer is the conversion to from POSIXct to POSIXlt with the seconds field intact.

This points to a bug in that conversion. I'm not using any data that isn't available to as.POSIXlt().

Update

The bug is in src/main/datetime.c:434 in the static function localtime0, but I am not sure yet of the correct fix:

Lines 433-434:

day = (int) floor(d/86400.0);
left = (int) (d - day * 86400.0 + 0.5);

The extra 0.5 for rounding the value is the culprit. Note that the subsecond value of t3 above exceeds .5. localtime0 deals with seconds only, and the subseconds are added in after localtime0 returns.

localtime0 returns correct results if the double presented is an integer value.

Compurgation answered 6/2, 2013 at 3:16 Comment(14)
+1 Wow, nice find! How did you ever think to check what would happen at larger dates?Abolition
@Aaron I tried after 2038-01-19 because it uses another bit in the representation, and can't be represented as a signed integer on 32-bit platforms. I did not expect incorrect results.Compurgation
It is a shame that operations on POSIXct types have to convert through POSIXlt, with the consequent subtle bugs. POSIXct really is the right way to handle datetime types; I just wish that the formatting was more robust.Ernestineernesto
I've entered a bug report on this behavior: bugs.r-project.org/bugzilla3/show_bug.cgi?id=15200Compurgation
I might have misunderstood your solution but I think I have found a counter example #15383557Bron
I see also that the bug report was closed because they thought it was a leap second issue, but that's not true, is it?Abolition
@Bron interesting. I'll see what that looks like under the debugger when I get a chance (later today).Compurgation
@Aaron no, this is definitely not a leap second issue. It is a bug in the code. I have not examined 2.15.3 to see if it remains.Compurgation
@MatthewLundberg hey, what is a bug, do you mean in your function or in the R base code ?Bron
@Bron In the R base code. Specifically line 434 of src/main/datetime.c.Compurgation
@MatthewLundberg, looks like it was patched 3 days ago. github.com/wch/r-source/commit/…Tag
@Tag I suspect that this is going to give wrong results if you want microsecond resolution. I'll manually apply the patch to the source here and see if it can be demonstrated.Compurgation
@MatthewLundberg, ah, just saw your last comment on the bug report too.Tag
This pb is still open is it ? x <- "04-Jan-2013 17:22:08.139" options("digits.secs"=6) as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS") [1] "2013-01-04 17:22:08.138 GMT"Bron

© 2022 - 2024 — McMap. All rights reserved.