data.table colClasses conversion to POSIXct
Asked Answered
C

1

8

Why doesn't the colClasses argument to data.table::fread seem to convert the REQUEST_DATE column to POSIXct in the example below? It converts the ROW_ID column without issue.

library(data.table)

txt <- "ROW_ID,REQUEST_TYPE,REQUEST_DATE
1,OTHER,2009-07-31 07:35:38
2,OTHER,2009-07-30 21:18:35
3,OTHER,2009-07-30 21:18:30
4,OTHER,2009-07-30 21:18:40
5,OTHER,2009-07-30 21:18:39
6,QUERY,2009-07-30 21:19:29
7,OTHER,2009-07-30 21:18:42
8,OTHER,2009-07-30 21:18:45
9,OTHER,2009-07-31 07:35:31
10,OTHER,2009-07-31 07:35:30
"
dt <- fread(txt, colClasses = c(ROW_ID = "character", REQUEST_DATE = "POSIXct"))

The typical conversion also works:

dt[, as.POSIXct(REQUEST_DATE)]
 [1] "2009-07-31 07:35:38 EDT" "2009-07-30 21:18:35 EDT" "2009-07-30 21:18:30 EDT" "2009-07-30 21:18:40 EDT" "2009-07-30 21:18:39 EDT"
 [6] "2009-07-30 21:19:29 EDT" "2009-07-30 21:18:42 EDT" "2009-07-30 21:18:45 EDT" "2009-07-31 07:35:31 EDT" "2009-07-31 07:35:30 EDT"

In this particular case I can't do dt[, REQUEST_DATE := as.POSIXct(REQUEST_DATE)] however because the real data has ~50m rows and many columns. The alternate syntax also doesn't seem to work:

dt <- fread(txt, colClasses = list(POSIXct = "REQUEST_DATE"))

The data.table help for fread says "A character vector of classes (named or unnamed), as read.csv. Or a named list of vectors of column names or numbers, see examples. colClasses in fread is intended for rare overrides, not for routine use. fread will only promote a column to a higher type if colClasses requests it. It won't downgrade a column to a lower type since NAs would result. You have to coerce such columns afterwards yourself, if you really require data loss."

It isn't clear to me that the POSIXct is considered a lower type than character.

I am using data.table version 1.10.0 .

Clearness answered 27/1, 2017 at 21:25 Comment(2)
Maybe the docs for the colClasses arg need to be updated, but it does say near the top "Dates are read as character currently. They can be converted afterwards using the excellent fasttime package or standard base functions."Unbound
Indeed, I am using the fasttime package. It converts 50m rows in about 12 seconds, which is pretty good!Clearness
C
5

As Frank mentions in the comments, it looks like this is a current data.table limitation. I ended up using the fastPOSIXct function in the fasttime package. It converts 50m rows in about a dozen seconds on my laptop, which is quite reasonable for my use case.

Clearness answered 30/1, 2017 at 12:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.