I have found a very annoying problem that I want to share with the community. This is a question that I have found an acceptable solution for (detailed below), but I now have several follow-up questions. My knowledge of time stamps and POSIX variables is limited, particularity how plyr, dplyr, and readr handle these.
When working with POSIX variables (aka, date and time stamps), I found that write_csv from readr changed these variables into UTC time.
I am downloading data from an API and preserving the time stamp. Each time I grab data, I bind it to an existing file and save the file. My timezone is MDT, and I am requesting data using MDT time, which I am then trying to bind to a file in UTC time, and the times don't match...it gets messy and frustrating. In essence the beautiful time stamp database I am trying to create is turning into a pile of garbage.
To remedy this problem, I converted the POSIX time column to character column using:
df.time <- as.character(df.time)
This allowed me to save the files in a time zone consistent with the time stamps being returned to me by the API.
This leads me to the following series of questions:
- Is there a program that can join POSIX variables across time zones? For instance, if its noon MDT, its 6pm UTC. Could I join two dataframes based on these time stamps without having to convert them to the same time zone first?
- Is it possible to prevent write_csv from changing POSIX variables to UTC?
- Is there a csv write function that doesn't change POSIX variables?
EDIT: I have included some example data of what I am talking about:
> df1 <- as.data.frame(fromJSON("https://api.pro.coinbase.com/products/BTC-USD/candles?start=2018-07-23&12:57:00?stop=2018-07-23&19:34:58granularity=300"))
> colnames(df1) <- c("time", "low", "high", "open", "close", "volume")
> df1$time <- anytime(df1$time)
> df1Sort <- df1[order(df1$time),]
> head(df1Sort, 5)
time low high open close volume
299 2018-07-23 16:13:00 7747.00 7747.01 7747.01 7747.01 9.2029168
298 2018-07-23 16:14:00 7743.17 7747.01 7747.00 7747.01 7.0205668
297 2018-07-23 16:15:00 7745.47 7745.73 7745.67 7745.73 0.9075707
296 2018-07-23 16:16:00 7745.72 7745.73 7745.72 7745.73 4.6715157
295 2018-07-23 16:17:00 7745.72 7745.73 7745.72 7745.72 2.4921921
> write_csv(df1Sort, "df1Sort.csv", col_names = TRUE)
> df2 <- read_csv("df1Sort.csv", col_names = TRUE)
Parsed with column specification:
cols(
time = col_datetime(format = ""),
low = col_double(),
high = col_double(),
open = col_double(),
close = col_double(),
volume = col_double()
)
> head(df2, 5)
# A tibble: 5 x 6
time low high open close volume
<dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2018-07-23 22:13:00 7747 7747 7747 7747 9.20
2 2018-07-23 22:14:00 7743 7747 7747 7747 7.02
3 2018-07-23 22:15:00 7745 7746 7746 7746 0.908
4 2018-07-23 22:16:00 7746 7746 7746 7746 4.67
5 2018-07-23 22:17:00 7746 7746 7746 7746 2.49
fread
fromdata.table
without any problems. Would be easier to help if we can see the data you're working with. – Floyfloydwrite_csv
storesPOSIXct
in my system-specific timezone. There is definitely no conversion to "UTC". – Unsparingas.Date(x)
forx
aPOSIXct
object setstz = "UTC"
by default; so if you useas.Date
on aPOSIXct
object and you don't explicitly match time-zones, times will be converted to UTC. This has nothing to do withwrite_csv
though. – Unsparingread_csv
storest
in UTC. I'm not sure what I did earlier. I did check, but must've done something wrong. Sorry for the confusion! – Unsparing