Bizzare as.POSIXct behavior for some time stamp with time part dropping
Asked Answered
A

0

7

I was struggling that read.csv with colClasses containing POSIXct was rounding up entire timestamps column down to date dropping time part. I came across a similar question suggesting that some date I have may miss time part. That was not the case. However, after bisecting my vector, I noticed that some particular time stamps are to blame. Here is a snippet.

as.POSIXct(c("2016-03-13 01:00:00", "2016-03-13 02:00:00", "2016-03-13 03:00:00"))

That yields to me

[1] "2016-03-13 CST" "2016-03-13 CST" "2016-03-13 CST"

It is around DST transition, but nevertheless where is time part? Is it a bug?

> version
               _                           
platform       i386-w64-mingw32            
arch           i386                        
os             mingw32                     
system         i386, mingw32               
status                                     
major          3                           
minor          3.0                         
year           2016                        
month          05                          
day            03                          
svn rev        70573                       
language       R                           
version.string R version 3.3.0 (2016-05-03)
nickname       Supposedly Educational      

Update

While setting time zone globally, seems to overcome the problem, it still looks like a bug to me.

Update 2

I confirm that behavior is Windows specific (platform specific bug?) here is the output from R 3.2.3 on Ubuntu

[1] "2016-03-13 01:00:00 CST" "2016-03-13 01:00:00 CST"
[3] "2016-03-13 03:00:00 CDT"

Update 3

There is a known unconfirmed bug #16852.

Update 4

Unless I'm missing something, there is no difference in my case between %S and %OS as mentioned in the comment.

> strptime(c("2016-03-13 01:00:00", "2016-03-13 02:00:00", "2016-03-13 03:00:00"), "%Y-%m-%d %H:%M:%S")
[1] "2016-03-13 01:00:00 CST" "2016-03-13 02:00:00"     "2016-03-13 03:00:00 CDT"
> strptime(c("2016-03-13 01:00:00", "2016-03-13 02:00:00", "2016-03-13 03:00:00"), "%Y-%m-%d %H:%M:%OS")
[1] "2016-03-13 01:00:00 CST" "2016-03-13 02:00:00"     "2016-03-13 03:00:00 CDT"

P.S. I didn't dig into the code yet... :/

Aurelea answered 20/7, 2016 at 18:53 Comment(8)
I doubt it's a bug. Due to DST, the datetime "2016-03-13 02:00:00" doesn't actually exist. It literally never happened. So as.POSIXct is probably just trying to be consistent in what it returns across the vector. Note that changing just that element to 01:59:59 works.Terrapin
@Terrapin I wouldn't complain if it just gave me an error, or turned that into 01:00:00 with whatever time zone. But to silently drop time part...., took me quite some time to pinpoint the issue.Aurelea
Well, I'm also fairly certain that most of these low-level datetime functions in R are simply using system level tools specific to your OS. So you may have Windows to blame, not R.Terrapin
As I'm digging through as.POSIXlt.character I'm beginning to reconsider my initial reaction to this...Terrapin
...specifically, compare the effect of strptime on your vector using a format of either "%Y-%m-%d %H:%M:%S" or "%Y-%m-%d %H:%M:%OS", the latter of which is what is eventually hit in as.POSIXlt.character, called from as.POSIXct.default. The %OS format is documented at the end of the Details section of ?strptime.Terrapin
Interesting. On my platform (OS X) it works with %S but with %OS is removes the times completely.Terrapin
On Windows 7, I can see that lapply(c("2016-03-13 01:00:00", "2016-03-13 02:00:00", "2016-03-13 03:00:00"), as.POSIXct, tz="America/Chicago") doesn't strip the times from the 1st and 3rd entries. Strange that it would give a different result when treated separately instead of together.Endgame
The R bug #16852 started via discussion on data.table issue 1619.Wraf

© 2022 - 2024 — McMap. All rights reserved.