Java 8 DateTimeFormatter parsing optional sections

Asked 4/7, 2018 at 15:24 Answered 4/7, 2018 at 15:51

Solved java java-8 java-time date-parsing

I need to parse date-times as strings coming as two different formats:

19861221235959Z
1986-12-21T23:59:59Z

The following dateTimeFormatter pattern properly parses the first kind of date strings

DateTimeFormatter.ofPattern ("uuuuMMddHHmmss[,S][.S]X")

but fails on the second one as dashes, colons and T are not expected.

My attempt was to use optional sections as follows:

DateTimeFormatter.ofPattern ("uuuu[-]MM[-]dd['T']HH[:]mm[:]ss[,S][.S]X")

Unexpectedly, this parses the second kind of date strings (the one with dashes), but not the first kind, throwing a

java.time.format.DateTimeParseException: Text '19861221235959Z' could not be parsed at index 0

It's as if optional sections are not being evaluated as optional...

Penology answered 4/7, 2018 at 15:24 Comment(9)

The 19861221235959 appears to be the year. It doesn't stop at 4 digits when parsing, only has a 4 digit minimum when formatting. – Illustrious 4/7, 2018 at 15:30

@Peter Lawrey can you elaborate a bit more on that? I don't understand your point – Penology 4/7, 2018 at 15:33

The first number 19861221235959 is too large to be a year so it fails to parse it. – Illustrious 4/7, 2018 at 15:35

But with the first pattern it worked without issues... The fact that the second pattern fails, seems as if the optional is not treated as such – Penology 4/7, 2018 at 15:37

I take your point that it probably should work, however I suspect you will need to peek at the contents or length and try one or the other format. – Illustrious 4/7, 2018 at 15:44

Thank you guys, I'll go for the workaround you suggested, using both formatters and discriminating by string content. I wonder if this is an actual bug in java... – Penology 4/7, 2018 at 15:49

The first format uses "adjacent value parsing", where the first field can be variable width if all subsequent fields are fixed width. The second format does not use adjacent value parsing, because the fields are separated by the dash (they are not adjacent!). See docs.oracle.com/javase/8/docs/api/java/time/format/… – Stormy 4/7, 2018 at 17:17

@Stormy my problem with the docs is that something this important should have been in DateTimeFormatter docs, with a special mention in the optional section part, warning about the way it can break adjacent parsing – Penology 5/7, 2018 at 6:58

I’ll just point out that it seems like the unwritten question is parsing both forms of ISO 8601 Date Time formats. As such, it seems like you would want the first pattern to actually be: 19861221T235959Z. – Colostrum 5/3, 2020 at 16:37

The problem is that your pattern is considering the entire string as the year. You can use .appendValue(ChronoField.YEAR, 4) to limit it to four characters:

DateTimeFormatter formatter = new DateTimeFormatterBuilder()
    .appendValue(ChronoField.YEAR, 4)
    .appendPattern("[-]MM[-]dd['T']HH[:]mm[:]ss[,S][.S]X")
    .toFormatter();

This parses correctly with both of your examples.

If you fancy being even more verbose, you could do:

DateTimeFormatter formatter = new DateTimeFormatterBuilder()
    .appendValue(ChronoField.YEAR, 4)
    .optionalStart().appendLiteral('-').optionalEnd()
    .appendPattern("MM")
    .optionalStart().appendLiteral('-').optionalEnd()
    .appendPattern("dd")
    .optionalStart().appendLiteral('T').optionalEnd()
    .appendPattern("HH")
    .optionalStart().appendLiteral(':').optionalEnd()
    .appendPattern("mm")
    .optionalStart().appendLiteral(':').optionalEnd()
    .appendPattern("ss")
    .optionalStart().appendPattern("X").optionalEnd()
    .toFormatter();

Deadman answered 4/7, 2018 at 15:51 Comment(7)

It works. Bravo. You should use X to parse the offset as in the question, though. – Rations 4/7, 2018 at 15:52

Nice one @Michael, thank you. I'm glad there is an option to avoid maintaining two different patterns. Shame to the Java Docs not mentioning this. – Penology 4/7, 2018 at 15:55

I’m curious why it’s enough to state the number of digits in the year and you don’t need to do it for the subsequent numeric fields. Could have to do with the fact that 999999999 is allowed as a year, whereas month can never be more than 12, and so on. – Rations 4/7, 2018 at 15:56

@OleV.V. Yes, precisely. Pretty much every other pattern has an upper bound in terms of number of possible characters. I think year is the only one that could potentially be arbitrarily long – Deadman 4/7, 2018 at 15:59

Indeed, that was the catch :) – Penology 4/7, 2018 at 16:20

@Michael: Thanks for the answer and the explanation, and I understand a year doesn't have an upper bound, but isn't that the point of putting yyyy instead of yy or yyyyyy? That is, shouldn't it be honoring the humber of placeholders you provide? – Alvinalvina 20/11, 2018 at 21:30

@Alvinalvina Nope, it doesn't work that way. See the documentation: "The count of letters determines the minimum field width below which padding is used..." – Deadman 21/11, 2018 at 13:23

It’s not clear from the documentation, but my guess is that the following is what happens.

When you use uuuuMMddHHmmss in your format pattern string, the formatter can easily see that there are several adjacent numeric fields and therefore uses the field widths to separate the fields. The first 4 digits are taken to mean the year, and so on.

When instead you use uuuu[-]MM[-]dd['T']HH[:]mm[:]ss, the formatter doesn’t perceive it as adjacent numeric fields. I agree with the comments by Peter Lawrey that it therefore takes a longer run of digits for year and in the end overflows the maximum year (999999999) and throws the exception.

The solution? Please refer to Michael’s answer.

Rations answered 4/7, 2018 at 15:46 Comment(0)

DateTimeFormatter based on patterns are not smart enough to handle both an optional section and the possibility to have two numeric fields without separation. When you do need your numeric fields to be without separator, no question asked, then the pattern understands that the change of pattern letter from u to M means that it needs to count the digits to know which digit is part of which fields. But when this is not a certainty, then the pattern doesn't try that. It sees one numeric field described entirely and not immediately followed with another numeric fields. Therefore, there is no reason to count digits. All the digits are part of the field supposed to be represented here.

To do that, you shouldn't try to build your DateTimeFormatter with a pattern, but rather with a Builder. Get your inspiration from DateTimeFormatter.BASIC_ISO_DATE and the others nearby.

Rochdale answered 4/7, 2018 at 15:50 Comment(0)

-1

At first glance your second format should be working for both cases. Not sure why it doesn't. BTW I am curious why you used 'u' as opposed to 'y' for a year. So I would try using 'y' as well just to see if it makes a difference. But in general you are touching on the interesting point - how to parse a date from unknown format (imagine that instead of 2 possible formats you are dealing with unknown number of formats). I actually wrote once a parser like that. The idea that I used to solve this problem is described in my article Java 8 java.time package: parsing any string to date. You might find the idea useful. In short the idea is to have external file that holds all supported formats in it and try to apply each format one-by-one until one works.

Keitt answered 4/7, 2018 at 15:49 Comment(5)

Trying to parse all possible formats one after the other is a performance killer to never ever use in any production code. Furthermore in real world applications the amount of date formats is known and limited (e.g.: APIs don't produce dates in 10 different formats). – Penology 4/7, 2018 at 16:0

@Penology obviously you can not be familiar with all use-cases in the world. In my case the data was coming from unknown sources and could come from anywhere in the world, so yes I could expect ANY date format (in my case we had over 30-40 different ones). Also the processing was done asynchronously and "off-line" so we could afford less-then-perfect performance but could NOT afford to miss some date un-parsed. So I stand by my idea. If you really want to discuss the issue please read my article in full – Keitt 4/7, 2018 at 16:19

I'm not trying to discredit your approach to the particular case you had to tackle. I'm just saying that in the common case such an approach is to be avoided. You know just in case that someone was intrigued by the idea of supporting all formats and applied your approach to problems for which it is not suited. – Penology 4/7, 2018 at 16:26

I see your point. But on the other hand you have to trust this forum readers to figure out for themselves what suitable or not for their case. In practice "common" case is actually very rarely is the case. There is always some twist. BTW I presented the idea, but you could easily make a lot of optimizations (such as read all the formats into memory and access them in memory which will solve a lot of performance issues) So I still disagree with your formulation of "to never ever use in any production code". But still, I do see a valid point in your statement – Keitt 4/7, 2018 at 16:47

uuuu versus yyyy in DateTimeFormatter formatting pattern codes in Java? – Rations 27/2, 2019 at 9:25

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags