r, write_csv is changing all times/dates to UTC
Asked Answered
C

2

8

I have found a very annoying problem that I want to share with the community. This is a question that I have found an acceptable solution for (detailed below), but I now have several follow-up questions. My knowledge of time stamps and POSIX variables is limited, particularity how plyr, dplyr, and readr handle these.

When working with POSIX variables (aka, date and time stamps), I found that write_csv from readr changed these variables into UTC time.

I am downloading data from an API and preserving the time stamp. Each time I grab data, I bind it to an existing file and save the file. My timezone is MDT, and I am requesting data using MDT time, which I am then trying to bind to a file in UTC time, and the times don't match...it gets messy and frustrating. In essence the beautiful time stamp database I am trying to create is turning into a pile of garbage.

To remedy this problem, I converted the POSIX time column to character column using:

df.time <- as.character(df.time)

This allowed me to save the files in a time zone consistent with the time stamps being returned to me by the API.

This leads me to the following series of questions:

  1. Is there a program that can join POSIX variables across time zones? For instance, if its noon MDT, its 6pm UTC. Could I join two dataframes based on these time stamps without having to convert them to the same time zone first?
  2. Is it possible to prevent write_csv from changing POSIX variables to UTC?
  3. Is there a csv write function that doesn't change POSIX variables?

EDIT: I have included some example data of what I am talking about:

> df1 <- as.data.frame(fromJSON("https://api.pro.coinbase.com/products/BTC-USD/candles?start=2018-07-23&12:57:00?stop=2018-07-23&19:34:58granularity=300"))
> colnames(df1) <- c("time", "low", "high", "open", "close", "volume")
> df1$time <- anytime(df1$time)
> df1Sort <- df1[order(df1$time),]
> head(df1Sort, 5)
                   time     low    high    open   close    volume
299 2018-07-23 16:13:00 7747.00 7747.01 7747.01 7747.01 9.2029168
298 2018-07-23 16:14:00 7743.17 7747.01 7747.00 7747.01 7.0205668
297 2018-07-23 16:15:00 7745.47 7745.73 7745.67 7745.73 0.9075707
296 2018-07-23 16:16:00 7745.72 7745.73 7745.72 7745.73 4.6715157
295 2018-07-23 16:17:00 7745.72 7745.73 7745.72 7745.72 2.4921921
> write_csv(df1Sort, "df1Sort.csv", col_names = TRUE)
> df2 <- read_csv("df1Sort.csv", col_names = TRUE)
Parsed with column specification:
cols(
  time = col_datetime(format = ""),
  low = col_double(),
  high = col_double(),
  open = col_double(),
  close = col_double(),
  volume = col_double()
)
> head(df2, 5)
# A tibble: 5 x 6
  time                  low  high  open close volume
  <dttm>              <dbl> <dbl> <dbl> <dbl>  <dbl>
1 2018-07-23 22:13:00  7747  7747  7747  7747  9.20 
2 2018-07-23 22:14:00  7743  7747  7747  7747  7.02 
3 2018-07-23 22:15:00  7745  7746  7746  7746  0.908
4 2018-07-23 22:16:00  7746  7746  7746  7746  4.67 
5 2018-07-23 22:17:00  7746  7746  7746  7746  2.49 
Clarisclarisa answered 24/7, 2018 at 1:32 Comment(8)
Can you share some examples of what kind of data you have and what it looks like? In terms of writing, a quick (and not so elegant) fix would be to write the time stamp as a character, with the corresponding time zone and offset information. I've used fread from data.table without any problems. Would be easier to help if we can see the data you're working with.Floyfloyd
"When working with POSIX variables (aka, date and time stamps), I found that write_csv from readr changed these variables into UTC time." I don't think that's true. I've just checked, and write_csv stores POSIXct in my system-specific timezone. There is definitely no conversion to "UTC".Unsparing
I think there might be a different issue here; for example as.Date(x) for x a POSIXct object sets tz = "UTC" by default; so if you use as.Date on a POSIXct object and you don't explicitly match time-zones, times will be converted to UTC. This has nothing to do with write_csv though.Unsparing
Maurits, this is almost what is going on. I do not use as.Data() anywhere in my code, but I noticed that my data is class POSIX, and then I drop the timezone, its still listed as POSIX, and when I save it and reopen, it is returned as UTC time. I'll update my post.Clarisclarisa
@MauritsEvers Really? Do you get a different result on the reproducible example in my answer? (Assuming you're not in the GMT+3 Addis Ababa timezone...) I wouldn't be too surprised if it is OS-specific - I'm on Windows currently.Tranquilizer
@Gregor, I am on Ubuntu 16.Clarisclarisa
@Gregor Ok, I've just checked; I'm on AEST, and I can reproduce your example: read_csv stores t in UTC. I'm not sure what I did earlier. I did check, but must've done something wrong. Sorry for the confusion!Unsparing
Its weird, itsnt it? I get this on multiple APIs, not just the coinbase one that I showed as an example.Clarisclarisa
T
5
  1. "Is there a program that can join POSIX variables across time zones... without having to convert them to the same time zone first?"

    Maybe? But if so, they're almost certainly just converting to UTC under the hood and just hiding it from you. I'm unaware of any thing like this in R. (data.table being the only package I'm aware of that can join on anything other than exact equality, and it doesn't have this feature.) If I were you, I'd just convert everything to one timezone - probably UTC.

    For more reading for best practices this SQL-focused answer seems very good.

  2. "Is it possible to prevent write_csv from changing POSIX variables to UTC?"

    Not built-in. The ?write_csv documentation is pretty clear: It doesn't list any options for this and does say "POSIXct's are formatted as ISO8601."

  3. "Is there a csv write function that doesn't change POSIX variables?"

    Sure, the built-in write.csv doesn't change to UTC (I think it uses system settings), and data.table::fwrite offers quite a few options. If you want to control how your dates are saved, I think your best bet is to convert them to character in whatever format you want, and then any of the writing functions should handle them just fine. You should check out the ?data.table::fwrite documentation, it's got good info. They warn that the "write.csv" option can be quite slow.


You should include reproducible examples with your questions. Here's one for this:

t = as.POSIXct("2018-01-01 01:30:00", tz = "Africa/Addis_Ababa")
t
# [1] "2018-01-01 01:30:00 EAT"

d = data.frame(t)

library(readr)
write_csv(d, "tz_test.csv")
system("head tz_test.csv")
# 2017-12-31T22:30:00Z

library(data.table)
fwrite(d, "tz_test_dt.csv", dateTimeAs = "write.csv")
system("head tz_test_dt.csv")
# t
# 2018-01-01 01:30:00

write.csv(d, "tz_test_base.csv")
system("head tz_test_base.csv")
# "","t"
# "1",2018-01-01 01:30:00
Tranquilizer answered 24/7, 2018 at 2:5 Comment(2)
Seems like switching between character and POSIX is the way to go then.Clarisclarisa
I think UTC is the way to go, but take from this what you will.Tranquilizer
C
1

It looks like you're using libraries from the tidyverse; have you had a look at the lubridate library?

The help file for as_date() may help you here convert a date-time variable to your desired timezone before you append/join your data.

For example:

> dt_utc <- ymd_hms("2010-08-03 00:50:50")
> dt_utc
[1] "2010-08-03 00:50:50 UTC"

> as_datetime(dt_utc, tz = "Australia/Melbourne")
[1] "2010-08-03 10:50:50 AEST"
Coltoncoltsfoot answered 24/7, 2018 at 2:12 Comment(1)
I appreciate the answer and library recommendation. I don't have issues switching back forth between timezones or character and posix. Rather, I has hoping for a solution to avoid having to do these transformations.Clarisclarisa

© 2022 - 2024 — McMap. All rights reserved.