Accurately converting from character->POSIXct->character with sub millisecond datetimes
Asked Answered
T

4

7

I have a character datetime column in a file. I load the file (into a data.table) and do things that require the column to be converted to POSIXct. I then need to write the POSIXct value back to file, but the datetime will not be the same (because it is printed incorrectly).

This print/formatting issue is well known and has been discussed several times. I've read some posts describing this issue. The most authoritative answers I found are given in response to this question. The answers to that question provide two functions (myformat.POSIXct and form) that are supposed to solve this issue, but they do not seem to work on this example:

x <- "04-Jan-2013 17:22:08.139"
options("digits.secs"=6)
form(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),format="%d-%b-%Y %H:%M:%OS3")
[1] "04-Jan-2013 17:22:08.138"
form(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),format="%d-%b-%Y %H:%M:%OS4")
[1] "04-Jan-2013 17:22:08.1390"
myformat.POSIXct(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),digits=3)
[1] "2013-01-04 17:22:08.138"
myformat.POSIXct(as.POSIXct(x,format="%d-%b-%Y %H:%M:%OS"),digits=4)
[1] "2013-01-04 17:22:08.1390"

My sessionInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                        
[5] LC_TIME=C                              

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] fasttime_1.0-0   data.table_1.8.9 bit64_0.9-2      bit_1.1-9
[5] sas7bdat_0.3     chron_2.3-43     vimcom_0.9-6    

loaded via a namespace (and not attached):
[1] tools_2.15.2
Turpentine answered 13/3, 2013 at 10:43 Comment(1)
For this date, both functions form() and myformat.POSIXct are doing essentially the same thing, rounding the seconds value to three places. But 0.139 cannot be represented exactly (.1389999 is what I see in the debugger for the fractional part of the rounded value) so the truncation remains. Note that 139 is prime (and thus relatively prime to 2 and 5).Helping
U
3

Two things:

1) @statquant is right (and the otherwise known experts @Joshua Ulrich and @Dirk Eddelbuettel are wrong), and @Aaron in his comment, but that will not be important for the main question here:

POSIXlt by design is definitely more accurate in storing times than POSIXct: As its seconds are always in [0, 60), it has a granularity of about 6e-15, i.e., 6 femtoseconds which would be dozens of million times less granular than POSIXct.

However, this is not very relevant here (and for current R): Almost all operations, notably numeric ones, use the Ops group method (yes, not known to beginners, but well documented), just look at Ops.POSIXt which indeed trashes the extra precision by first coercing to POSIXct. In addition, the format()/print() ing uses 6 decimals after the "." at most, and hence also does not distinguish between the internally higher precision of POSIXlt and the "only" 100 nanosecond granularity of POSIXct.
(For the above reason, both Dirk and Joshua were lead to their wrong assertion: For all simple practical uses, the precision of *lt and *ct is made the same).

2) I do tend to agree that we (R Core) should improve the format()ing and hence print()ing of such fractions of seconds POSIXt objects (still after the bug fix mentioned by @Aaron above).
But then I may be wrong, and "we" have got it right, by some definition of "right" ;-)

Unwashed answered 11/8, 2018 at 15:53 Comment(0)
O
5

So I guess you do need a little fudge factor added to my suggestion here: https://mcmap.net/q/280021/-how-r-formats-posixct-with-fractional-seconds. This seems to work but perhaps might include other bugs; test carefully and think about what it's doing before using for anything important.

myformat.POSIXct <- function(x, digits=0) {
  x2 <- round(unclass(x), digits)
  attributes(x2) <- attributes(x)
  x <- as.POSIXlt(x2)
  x$sec <- round(x$sec, digits) + 10^(-digits-1)
  format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep=""))
}
Octamerous answered 13/3, 2013 at 16:44 Comment(5)
Your fudge factor looks like a good one here. It would be possible to test this in a loop, at least for small values of digits. Oh, and I'm totally stealing your fudge factor. I added it to my answer in the other, identical question, and will use it in actual code.Helping
Glad you think it looks good. It seemed like a reasonable thing to do but I didn't take the time to think it through all the way.Octamerous
Good news it looks like it works on my 1.5M training set (with milliseconds). it seems that it is very slow, but hopefully if the fix is good, may be it can be used to fix the way POSIXct displays (I mean prints) datetimes at C level...Turpentine
I actually doubt all the code is here necessary with the fudge factor added. I was rounding twice as I thought that would make the fudge factor unneeded, but you discovered I was wrong. It might be enough to just round and add the fudge factor to the POSIXct initially and then print.Octamerous
Also stay tuned in the next version of R; in the comments to the other question you'll see that it looks like they may have added a fudge factor in the default printing code itself.Octamerous
M
4

As the answers to the questions you linked to already say, how a value is printed/formatted is not the same as what the actual value is. This is just a printed representation issue.

R> as.POSIXct('2011-10-11 07:49:36.3')-as.POSIXlt('2011-10-11 07:49:36.3')
Time difference of 0 secs
R> as.POSIXct('2011-10-11 07:49:36.2')-as.POSIXlt('2011-10-11 07:49:36.3')
Time difference of -0.0999999 secs

Your understanding that POSIXct is less precise than POSIXlt is incorrect. You're also incorrect in saying that you can't include a POSIXlt object as a column in a data.frame.

R> x <- data.frame(date=Sys.time())
R> x$date <- as.POSIXlt(x$date)
R> str(x)
'data.frame':   1 obs. of  1 variable:
 $ date: POSIXlt, format: "2013-03-13 07:38:48"
Monitor answered 13/3, 2013 at 12:41 Comment(5)
@statquant: because it's another question, not an answer.Monitor
Ok for the representation, for inclusion in data.frame I meant data.table. The post I am refering gives suggestion on how to solve this representation issue, however with 04-01-2013 17:22:08.139 it seems to fail (see my EDIT). Is there a way to get a accurate representation from POSIXct (at a millisecond level) ?Turpentine
@statquant: It is accurate. You're still confusing the actual POSIXct value with what is printed.Monitor
no I am not, I am actually asking how I can print accurately the time of a POSIXct object. Let's say I have a character datetime column in a file, I load the file and do things that require the column to be casted as POSIXct, if I need to write back the file the datetime will not be the same (It is printed wrongly)Turpentine
@statquant: I see. That's a clearly articulated problem. Can you edit your question to remove all the extraneous prose, quotes from other posts, and your guesses at solutions? Leave an example of your input and desired output and I'm sure someone will provide an answer.Monitor
B
3

When you write

My understanding is that POSIXct representation is less precise than the POSIXlt representation

you are plain wrong.

It is the same representation for both -- down to milliseconds on Windows, and down to (almost) microseconds on the other OSs. Did you read help(DateTimeClasses) ?

As for your last question, yes the development version of my RcppBDT package uses Boost Date.Time and can go all the way to nanoseconds if your OS supports it and you turned the proper representation on. But it does replace POSIXct, and does not yet support vectors of time objects.

Edit: Regarding your follow-up question:

R> one <- Sys.time(); two <- Sys.time(); two - one
Time difference of 7.43866e-05 secs
R>
R> as.POSIXlt(two) - as.POSIXlt(one)
Time difference of 7.43866e-05 secs
R> 
R> one    # options("digits.sec"=6) on my box
[1] "2013-03-13 07:30:57.757937 CDT"
R> 

Edit 2: I think you are simply experiencing that floating point representation on computers is inexact:

R> print(as.numeric(as.POSIXct("04-Jan-2013 17:22:08.138",
+                   format="%d-%b-%Y %H:%M:%OS")), digits=18)
[1] 1357341728.13800001
R> print(as.numeric(as.POSIXct("04-Jan-2013 17:22:08.139",
+                   format="%d-%b-%Y %H:%M:%OS")), digits=18)
[1] 1357341728.13899994
R> 

The difference is not precisely 1/1000 as you assumed.

Bushnell answered 13/3, 2013 at 11:45 Comment(6)
Hello Dirk, are you sure as far as the representation is concerned? I edited quoting another post to illustrate what I meant. I red help(DateTimeClass) it is the same as ?POSIXlt, I do not see anything Windows specific. As you seem to have been deep down those POSIXctissues already, how can I get a correct representation of millisecond datetime with POSIXct?Turpentine
Dirk, any reference on your statement about Windows vs other OSs ?Turpentine
Please re-read my original answer. Windows --> milliseconds only.Bushnell
Hi Dirk, I think as far as floating-point representation goes, the POSIXct is indeed less precise; it has to fit a lot more significant digits into the same size numeric as it has the number of seconds since 1970 plus any fractional part; since POSIXlt separates the seconds into its own numeric, there's less significant digits so the floating point representation can be more precise. @Turpentine is referring to my answer here https://mcmap.net/q/280021/-how-r-formats-posixct-with-fractional-seconds which gives an example.Octamerous
@statquant: I believe this to be wrong. POSIXct is 64 bit double split into 53 and 11 bit. Show source file or R internals / R language manuals for the 40 bit claim.Bushnell
from ?POSIXlt Class ‘"POSIXlt"’ is a named list of vectors representing [...] sec as numeric the rest as integer so 40 bytes (realized I miswrote 40 bits instead of 40 bytes)Turpentine
U
3

Two things:

1) @statquant is right (and the otherwise known experts @Joshua Ulrich and @Dirk Eddelbuettel are wrong), and @Aaron in his comment, but that will not be important for the main question here:

POSIXlt by design is definitely more accurate in storing times than POSIXct: As its seconds are always in [0, 60), it has a granularity of about 6e-15, i.e., 6 femtoseconds which would be dozens of million times less granular than POSIXct.

However, this is not very relevant here (and for current R): Almost all operations, notably numeric ones, use the Ops group method (yes, not known to beginners, but well documented), just look at Ops.POSIXt which indeed trashes the extra precision by first coercing to POSIXct. In addition, the format()/print() ing uses 6 decimals after the "." at most, and hence also does not distinguish between the internally higher precision of POSIXlt and the "only" 100 nanosecond granularity of POSIXct.
(For the above reason, both Dirk and Joshua were lead to their wrong assertion: For all simple practical uses, the precision of *lt and *ct is made the same).

2) I do tend to agree that we (R Core) should improve the format()ing and hence print()ing of such fractions of seconds POSIXt objects (still after the bug fix mentioned by @Aaron above).
But then I may be wrong, and "we" have got it right, by some definition of "right" ;-)

Unwashed answered 11/8, 2018 at 15:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.