Subset dataframe based on POSIXct date and time greater than datetime using dplyr
Asked Answered
M

1

15

I am not sure what is going wrong with selecting date times as a POSIXct format. I have read several comments on subsetting a dataframe based on as.Date and I can get that to work without an issue. I have also read many posts suggesting that filtering POSIXct formats should work, but for some reason I cannot get it to work.

An example dataframe:

library(lubridate)
library(dplyr)

date_test <- seq(ymd_hms('2016-07-01 00:00:00'),ymd_hms('2016-08-01 00:00:00'), by = '15 min')
date_test <- data.frame(date_test)
date_test$datetime <- date_test$date_test
date_test <- select(date_test, -date_test)

I checked that it is in POSIXct format and then tried several ways to subset the dataframe greater than 2016-07-01 01:15:00. However the output never shows the date times less than 2016-07-01 01:15:00 being removed. I am sorry if this has been asked somewhere and I cannot find it but I have looked and tried to get this to work. I am using UTC as the timezone to avoid daylight savings time issues so that is not the issue here - unless the filter requires it.

class(date_test$datetime)

date_test <- date_test %>% filter(datetime > '2016-07-01 01:15:00')

date_test <- date_test %>% 
  filter(datetime > as.POSIXct("2016-07-01 00:15"))

date_test <- subset(date_test, datetime > as.POSIXct('2016-07-01 01:15:00')) 

Now if I filter using:

date_test <- date_test %>% 
  filter(datetime > as.POSIXct("2016-07-10 01:15:00"))

the output is very strange with a day behind and the wrong time?

2016-07-09 13:30:00
2016-07-09 13:45:00
2016-07-09 14:00:00
2016-07-09 14:15:00
2016-07-09 14:30:00

If it helps I am using MAC OS Sierra with R Studio Version 1.0.143 and R You Stupid Darkness, DPLYR 0.5 and Lubridate 1.6

Myrt answered 9/5, 2017 at 22:27 Comment(4)
Try this: date_test[date_test$datetime > as.POSIXct("2016-07-01 01:15:00", tz="UTC"),]Disarming
ymd_hms uses as.POSIXct times in "UTC" timezone by default - as.POSIXct uses the system timezone (e.g. - Australia for me, Illinois for you) - you need to consistently use ymd_hms or change to the "UTC" timezone as per Dave's suggestion.Jenine
OK, so that solves a huge issue I have been having for a while. Right now I am in New Zealand not Illinois as the time suggests and explains why time functions have not been working. It was never clear to me that when you use dates without specifying UTC that it defaulted to the system clock. I was able to get the following to work. THANK YOU SO MUCH date_test <- date_test %>% filter(datetime > as.POSIXct('2016-07-01 01:15:00', tz="UTC")) and test <- date_test[date_test$datetime > as.POSIXct("2016-07-01 01:15:00", tz="UTC"),] test <- as.data.frame(test)Myrt
Also, if anyone runs into this in the future as @Jenine suggested this also works `date_test <- date_test %>% filter(datetime > ymd_hms('2016-07-2 00:00:00'))'. Thank you again and I am not sure how to mark your comment as the answer but it was perfect!Myrt
J
25

ymd_hms uses POSIXct times in "UTC" timezone by default - as.POSIXct uses the system timezone (e.g. - Australia for me) - you need to consistently use ymd_hms or change to the "UTC" timezone as per Dave's suggestion in the comments.

E.g.: these examples work:

date_test <- seq(ymd_hms('2016-07-01 00:30:00'),ymd_hms('2016-07-01 01:30:00'), by = '15 min')
date_test <- data.frame(datetime=date_test)
date_test

#             datetime
#1 2016-07-01 00:30:00
#2 2016-07-01 00:45:00
#3 2016-07-01 01:00:00
#4 2016-07-01 01:15:00
#5 2016-07-01 01:30:00

date_test %>% 
  filter(datetime > as.POSIXct("2016-07-01 01:00:00", tz="UTC"))

date_test %>% 
  filter(datetime > ymd_hms("2016-07-01 01:00:00"))

#             datetime
#1 2016-07-01 01:15:00
#2 2016-07-01 01:30:00
Jenine answered 10/5, 2017 at 0:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.