Is there a fast parser for date
Asked Answered
A

1

8

For datetimes fasttime provides very fast parsing to POSIXct

library('fasttime')
library('lubridate')
library('microbenchmark')

# parse character to POSIXct
Sys.setenv(TZ='UTC')
test <- rep('2011-04-02 11:01:00',1e4)
microbenchmark(
  test1 <- fastPOSIXct(test),
  test2 <- fast_strptime(test,format='%Y-%m-%d %H:%M:%S'),
  test3 <- as.POSIXct(test, format='%Y-%m-%d %H:%M:%S'),
  test4 <- ymd_hms(test),
  times=100)
Unit: microseconds
                                                       expr       min        lq      mean    median         uq       max
                                 test1 <- fastPOSIXct(test)   663.123   692.337  1409.448   701.821   712.4965 71231.585
 test2 <- fast_strptime(test, format = "%Y-%m-%d %H:%M:%S")  1026.342  1257.508  1263.157  1264.928  1273.8145  1366.438
    test3 <- as.POSIXct(test, format = "%Y-%m-%d %H:%M:%S")  9865.265 10060.450 10154.651 10145.551 10186.3030 13358.136
                                     test4 <- ymd_hms(test) 13990.206 17152.779 17278.654 17308.347 17393.6625 22193.544

Is there something equivalent for dates Date, the lubridate package provides some parser but the fast one (fast_strptime) cast dates to POSIXct (not meant for dates) Casting POSIXct to Date is too long.

Given how quick it is to parse to POSIXct I would think there should be something as quick to Date

Is there a fast packaged alternative ?

Ahlgren answered 6/2, 2016 at 22:7 Comment(0)
G
7

Given

## the following two (here three) lines are all of fasttime's R/time.R
fastPOSIXct <- function(x, tz=NULL, required.components = 3L)
  .POSIXct(if (is.character(x)) .Call("parse_ts", x, required.components)
           else .Call("parse_ts", as.character(x), required.components), tz)

hence

## so we suggest to just use it, and convert later
fastDate <- function(x, tz=NULL)
  as.Date(fastPOSIXct(x, tz=tz))

which at least beats as.Date():

R> library(microbenchmark)
R> library(fasttime)
R> d <- rep("2010-11-12", n=1e4)
R> microbenchmark(fastDate(d), as.Date(d), times=100)
Unit: microseconds
        expr    min      lq    mean  median      uq     max neval cld
 fastDate(d) 47.469 48.8605 54.3232 55.7270 57.1675 104.447   100  a 
  as.Date(d) 77.194 79.4120 85.3020 85.2585 87.3135 121.979   100   b

R> 

If you wanted to go super fast, you could start with tparse.c to create the date-only subset you want.

Gaily answered 6/2, 2016 at 22:23 Comment(5)
Any way your RcppBDT can help avoiding me going the C way ?Ahlgren
There are parsers in Boost DateTime but I chose not to expose them as you would require linking against the library. Which we currently do not need for the pure time calculations which are all header based. And header-only makes for much easier builds and deployments.Gaily
Just so you know you do not need to paste(x,'12:00:00'), it works without by default (see documentation)Ahlgren
Right. It stops parsing the step when it is over. That will make this solution harder to beat... Amending post and numbers.Gaily
Just realized it parses and then call .POSIXct on the result, if as.Date does not do anything stupid it will indeed be hard to beat. Also, for more general formats like %Y%m%d then using the same function with fast_strptime from lubridate will work...Ahlgren

© 2022 - 2024 — McMap. All rights reserved.