What are the "standard unambiguous date" formats for string-to-date conversion in R?
Asked Answered
P

8

115

Please consider the following

$ R --vanilla

> as.Date("01 Jan 2000")
Error in charToDate(x) :
    character string is not in a standard unambiguous format

But that date clearly is in a standard unambiguous format. Why the error message?

Worse, an ambiguous date is apparently accepted without warning or error and then read incorrectly!

> as.Date("01/01/2000")
[1] "0001-01-20"

I've searched and found 28 other questions in the [R] tag containing this error message. All with solutions and workarounds involving specifying the format, iiuc. This question is different in that I'm asking where are the standard unambiguous formats defined anyway, and can they be changed? Does everyone get these messages or is it just me? Perhaps it is locale related?

In other words, is there a better solution than needing to specify the format?

29 questions containing "[R] standard unambiguous format"

> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252
[2] LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Porphyritic answered 7/2, 2013 at 15:58 Comment(6)
judging by the function definition of as.Date.character the input is only tested for these two formats: "%Y-%m-%d" and "%Y/%m/%d". If it can match one of them it seems to be deemed "unambiguous".Charterhouse
@CarlWitthoft "Did I even read" seems to imply the answer is blindingly obvious in ?as.Date. Where does it help with this?Porphyritic
@Charterhouse Thanks, that seems to be the answer. Would you mind adding it then I can accept.Porphyritic
Arguably "Jan 24 1949" and "24 Jan 1949" would be unambiguous, but they are certainly Anglo-centric. Yet there are also values for 'month.abb' that are Anglo-centric as well, so a case could be made for those values to be matched in cases where : strptime(xx, f <- "%d $B %Y", tz = "GMT") or strptime(xx, f <- "%B $d %Y", tz = "GMT") returned values. (I'm not implying that month.abb is used for the matching to %B since the docs say the matching is locale specific.)Verbenaceous
@CarlWitthoft Some of us trip up every now and again. Thanks for the kick while I'm down. In this question I got quite a few things right: I included sessionInfo(), I searched, told you what I searched and included a link, I kept it as consise as possible. I missed one line in ?as.Date and you give me the TFM treatment. We can't all be as perfect as you all the time.Porphyritic
@MatthewDowle sorry if I came down hard. I think the flamosity started when you appeared to confuse "unambiguous to a reasonably well-educated human" with "unambiguous to a poor helpless piece of code" . :-(Raouf
K
75

This is documented behavior. From ?as.Date:

format: A character string. If not specified, it will try '"%Y-%m-%d"' then '"%Y/%m/%d"' on the first non-'NA' element, and give an error if neither works.

as.Date("01 Jan 2000") yields an error because the format isn't one of the two listed above. as.Date("01/01/2000") yields an incorrect answer because the date isn't in one of the two formats listed above.

I take "standard unambiguous" to mean "ISO-8601" (even though as.Date isn't that strict, as "%m/%d/%Y" isn't ISO-8601).

If you receive this error, the solution is to specify the format your date (or datetimes) are in, using the formats described in the Details section in ?strptime.

Make sure that the order of the conversion specification as well as any separators correspond exactly with the format of your input string. Also, be sure to use particular care if your data contain day/month names and/or abbreviations, as the conversion will depend on your locale (see the examples in ?strptime and read ?LC_TIME; see also strptime, as.POSIXct and as.Date return unexpected NA).

Klina answered 7/2, 2013 at 16:10 Comment(3)
@BenBolker How about "character string is not either %Y-%m-%d or %Y/%m/%d"?Porphyritic
The behavior is certainly documented in ?as.Date (+1). However, the error message "standard unambiguous format" is ironically ambiguous, to which the 23 previous questions attest. A more direct error message like, "format not recognized, see documentation" might improve user experience. Also, I don't believe "01/01/2000" is ISO-8601 ("2000-01-01" is ISO-8601), which adds to the ambiguity.Lode
@jthetzel: you are right, "01/01/2000" is not ISO-8601. I meant that I personally think of ISO-8601 to be the standard, unambiguous format. And I agree that as.Date not complaining about "01/01/2000" is inconsistent with the error message.Klina
W
42

In other words, is there a better solution than needing to specify the format?

Yes, there is now (ie in late 2016), thanks to anytime::anydate from the anytime package.

See the following for some examples from above:

R> anydate(c("01 Jan 2000", "01/01/2000", "2015/10/10"))
[1] "2000-01-01" "2000-01-01" "2015-10-10"
R> 

As you said, these are in fact unambiguous and should just work. And via anydate() they do. Without a format.

Whitecap answered 20/11, 2016 at 21:32 Comment(5)
Only came here because we had another question of something trying to parse dates with an incomplete format. For complete ones, we're now have something. I am quite pleased with this -- it was a nagging question. And needless to say, anytime() is equally useful for POSIXct.Whitecap
Just used the anytime package and it worked wonderfully, except quite a few NAs. After I ran trimws() on the date vector, everything was perfect.Gaze
I use it a metric ton too!Whitecap
Looks so simple! I used anydate() on a column with string values of mm-dd (no yy). All <chr> values in the column were successfully converted to <date>. Unfortunately, it set the year to '1400' instead of '2020'. ¯_(ツ)_/¯Strategic
Well, not quite. As I answered in a few other questions on this site, mm-dd is not a date (neither is mm-yy or mm-yyyy). You cannot parse what it is not there.Whitecap
C
27

As a complement to @JoshuaUlrich answer, here is the definition of function as.Date.character:

as.Date.character
function (x, format = "", ...) 
{
    charToDate <- function(x) {
        xx <- x[1L]
        if (is.na(xx)) {
            j <- 1L
            while (is.na(xx) && (j <- j + 1L) <= length(x)) xx <- x[j]
            if (is.na(xx)) 
                f <- "%Y-%m-%d"
        }
        if (is.na(xx) || !is.na(strptime(xx, f <- "%Y-%m-%d", 
            tz = "GMT")) || !is.na(strptime(xx, f <- "%Y/%m/%d", 
            tz = "GMT"))) 
            return(strptime(x, f))
        stop("character string is not in a standard unambiguous format")
    }
    res <- if (missing(format)) 
        charToDate(x)
    else strptime(x, format, tz = "GMT")
    as.Date(res)
}
<bytecode: 0x265b0ec>
<environment: namespace:base>

So basically if both strptime(x, format="%Y-%m-%d") and strptime(x, format="%Y/%m/%d") throws an NA it is considered ambiguous and if not unambiguous.

Charterhouse answered 7/2, 2013 at 16:19 Comment(0)
A
9

Converting the date without specifying the current format can bring this error to you easily.

Here is an example:

sdate <- "2015.10.10"

Convert without specifying the Format:

date <- as.Date(sdate4) # ==> This will generate the same error"""Error in charToDate(x): character string is not in a standard unambiguous format""".

Convert with specified Format:

date <- as.Date(sdate4, format = "%Y.%m.%d") # ==> Error Free Date Conversion.
Arse answered 19/12, 2015 at 20:42 Comment(0)
G
5

This works perfectly for me, not matter how the date was coded previously.

library(lubridate)
data$created_date1 <- mdy_hm(data$created_at)
data$created_date1 <- as.Date(data$created_date1)
Goof answered 7/6, 2019 at 0:56 Comment(0)
K
2

As a complement: This error can be raised as well if an entry you are trying to cast is a string that should have been NA. If you specify the expected format -or use "real" NAs- there are no problems:

Minimum reproducible example with data.table:

library(data.table)
df <- data.table(date_good = c("01-01-2001", "01-01-2001"), date_bad= ("NA", "01-01-2001"))

df[, .(date_good = as.Date(date_good), date_bad = as.Date(date_bad))]
# Error in charToDate(x) : character string is not in a standard unambiguous format

df[, .(date_good = as.Date(date_good), date_bad = as.Date(date_bad, format="%Y-%m-%d"))]
# No errors; you simply get NA.

df2 <- data.table(date_good = c("01-01-2001", "01-01-2001"), date_bad= (NA, "01-01-2001"))
    
df2[, .(date_good = as.Date(date_good), date_bad = as.Date(date_bad))]
# Just NA
Keratogenous answered 4/8, 2021 at 21:13 Comment(1)
You might even want to specify NA_character_ (the default NA is of logical type; in practice this hardly matters)Pede
A
0

If the date is for example: "01 Jan 2000", I recommend using

library(lubridate)
date_corrected<-dmy("01 Jan 2000")
date_corrected
[1] "2000-01-01"
class(date_corrected)
[1] "Date"

lubridate has a function for almost every type of date.

Abandon answered 12/7, 2021 at 20:55 Comment(0)
J
-1

The solutions did not work for me; I still had the same error. The backtrace said that the error arose in the charToDate() function.

This article from Statistics Globe solved it for me

They use the 'anytime' package with the 'anydate' function:

df <- df %>% dplyr::mutate(New_Date = as.Date(anytime::anydate(Old_Date)))
Jarad answered 17/2, 2023 at 15:45 Comment(2)
You might have overlooked it, but Dirk Eddelbuettel's existing answer to this question gives this solution ... (it's not clear why as.Date() is needed, anydate already returns an object of class "Date" ... ??)Pede
I was premature in posting that; it didn't work, converting everything to missing. I'm looking at that answer.Jarad

© 2022 - 2025 — McMap. All rights reserved.