Parsing ISO8601 in R
Asked Answered
S

4

9

Are there any existing implementations in R to parse ISO8601 strings into POSIXt objects? The ISO8601 spec allows date/times to be printed in a variety of (non-overlapping) formats, so one probably needs to do some regular expression magic to detect the format and feed that to strptime.

Doing this properly might actually be quite challenging, however something that detects the most common formats would already be very helpful. I can hardly imagine I am the first one to run into this, but I am having a hard time finding good implementations.

Southbound answered 25/8, 2012 at 21:43 Comment(4)
Google sucks for turning Google Code Search off. Anyway -- R has it, and Josh gave you a pointer.Palsy
Google only gave me lubridate and isodate and a bunch of blogs :/ But I'm glad I asked because I was about to start implementing it myself.Southbound
Try rseek.org which gave me relevant hits for 'iso 8601' rightaway.Palsy
Ah thanks. I'm not overly excited about the xts implementation by the way. It only works for 1 of the 4 example formats given on the ISO8601 wiki page.Southbound
K
7

Strictly speaking, you can't. I don't need to know anything about r or cran (or even what they are) to tell you that, because I know ISO 8601 well enough to know that just knowing something is ISO 8601 is not enough to unambiguously know that what is meant by it, especially in the shorter forms.

Find out what profile of ISO 8601 the other party is using. If they don't know what you're talking about, then you will be doing them a favour when you point out what I just said in the paragraph above. As I wrote once elsewhere,

Unfortunately many people think of a particular profile they are familiar with when they hear “ISO 8601”, other people know that using 8601 is a Good Thing but are not familiar with the details of implementation. Hence a spec or requirements document might mention 8601 but not be more explicit than that. In such cases it’s important to seek clarification rather than assume that the format you think of as “ISO 8601” is the correct one to use.

So, tell them "'ISO 8601' is not specific enough, I need to know exactly what you are doing, what your limits on precision are." (And possibly what your policy on dates prior to 1582 and perhaps again prior to 0001 are, your policy on leap-seconds, and a few other things left open but the standard)

Then whatever you're dealing with should be easy enough: Aside from this point of ambiguity, it is a pretty straight-forward standard. It should just be thought of as a standard about defining date formats, more than one that defines a date format.

Kenlee answered 25/8, 2012 at 22:10 Comment(4)
Interesting; I had always assumed that the spec was designed to be unambiguous and the profiles mutually exclusive. Unfortunately, in my case data is provided by many parties and ISO8601 is the only thing that is agreed upon. That said, R programmers often need to be pragmatic; some parsing parameters with sensible defaults should get us a long way...Southbound
120826 is a valid 8601 code for today (not even Y2K-safe!) and for 8 minutes and 26 seconds past noon on every day (8601 does warn against stuch for general interchange). Likewise, both Jeroen's and Dirk's above are valid, but some only one worked with a given function. Chances are they'll all be talking about dates or all about times and all want or all not want time zone info, and hence you should be able to come up with an unambiguous parsing that covers the lot of them, but I'd start with asking, "just what do you mean by 8601?". hackcraft.net/web/datetime/#iso8601 for moreKenlee
120826 is not valid according to ISO 8601. It used to be valid according to ISO 8601:1988, but that standard was withdrawn long time ago.Payable
@MihaiCapotă yes, as of 2004 it's obsolete. This is not to say standard versions older than 10 years never get used. Besides, the fact that 20120826 is still a valid form remains; ISO 8601 describes a range of possible formats, not a format, and double-checking which is required (and insisting on extended format if it's left up to you; basic format is a pain) remains necessary.Kenlee
T
4

See .parseISO8601 in the xts package for one implementation. I doubt this will work "out of the box", but it should give you a good idea how to implement your specific needs.

Technocracy answered 25/8, 2012 at 21:44 Comment(3)
Hmz first thing I try fails: xts::.parseISO8601("2012-08-25T09:24Z")Southbound
Add seconds which (here at least) are mandatory: xts::.parseISO8601("2012-08-25T09:24:02") works.Palsy
Yes it's the Z at the end indicating UTC time that isn't supported.Southbound
C
2

This looks promising: http://cran.r-project.org/web/packages/parsedate

parsedate: Recognize and Parse Dates in Various Formats, Including All ISO 8601 Formats

Parse dates automatically, without the need of specifying a format. Currently it includes the git date parser. It can also recognize and parse all ISO 8601 formats.

Chamberlin answered 8/3, 2015 at 18:27 Comment(1)
This seems only to handle dates, but the OP asked for date-times.Sprawl
S
1
t <- strptime("2013-08-20T14:56:37", "%FT%T")

worked well enough for me for most cases. It already fails on fractions of seconds though and does not include solutions to all the problems Jon Hanna mentioned. (And which make working with time data types so unbelievably difficult.)

Scottscotti answered 20/8, 2013 at 13:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.