September's short form "Sep" no longer parses in Java 17 in en_GB locale
Asked Answered
A

3

28

This works with Java 11 but does not work with Java 17

DateTimeFormatter format = DateTimeFormatter.ofPattern("MMM dd, yyyy")
    .withLocale(Locale.UK);
format.parse("Sep 29, 1988");

Java 17 stacktrace:

Exception in thread "main" java.time.format.DateTimeParseException: Text 'Sep 29, 1988' could not be parsed at index 0
at java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2052)
at java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1880)

My Java version:

openjdk version "17" 2021-09-14 LTS
OpenJDK Runtime Environment Zulu17.28+13-CA (build 17+35-LTS)
OpenJDK 64-Bit Server VM Zulu17.28+13-CA (build 17+35-LTS, mixed mode, sharing)

What has changed?

Antioch answered 21/9, 2021 at 10:58 Comment(4)
That was it. My default locale is en_GB. Not sure if this change in behaviour is intentional (it probably is) but it's very inconvenient.Antioch
This is why you should use standardized date formats rather than localiser strings when exchanging date-time values textually.Forestation
@BasilBourque I'm parsing this from some HTML so obviously it's not my choice.Antioch
Related question: https://mcmap.net/q/504333/-customize-a-locale-in-javaExternality
D
33

It seems to be that in the en_GB locale, the short form of September is now "Sept", not "Sep". All the other months are the same 3 letters abbreviations as in en_US. Kind of makes sense. As a Brit, "Sep" looks wrong to me.

This is the ticket: https://bugs.openjdk.java.net/browse/JDK-8251317

It wasn't a conscious decision by the JDK authors. The locale data used by default in Java comes from Common Locale Data Repository (CLDR), which is a project by the Unicode Consortium. Newer versions of Java come with newer versions of the CLDR. So you may occasionally see a change in locale behavior. So the change you encountered is a feature, not a bug.

Yours is just one of many small tweaks.

Here's the specific change in the PR which broke it for you: https://github.com/openjdk/jdk/pull/1279/files#diff-97210acd6f77c4f4979c43445d60ba1c369f058230e41177dceca697800b1fa2R116

Deathtrap answered 21/9, 2021 at 11:40 Comment(18)
All of our production servers have GB locale and the code is full of date parsing. I'd vote for consistency over "what looks right" but I guess that's just my opinion. I appreciate the links.Antioch
@steven35: the problem with consistency in this is that locales can never improve if we value consistency above all else. Basically parsing free-form text dates without very precise specifications (which tend to use numbers) is a risky thing to do no matter what library you use.Irreverence
@Antioch You say you want consistency but locale-related stuff is constantly evolving. Currencies, countries, language, etc, etc, are all fluid. Keeping the data static might make it consistent between Java versions, but it becomes inconsistent with reality. They need to update it at some point.Deathtrap
@Deathtrap if data were saved in a sane format, thez wouldn't need updateFalsehood
Agree with @9ilsdx9rvj0lo here – I don't expect, for example, the format uuuu-MM-dd'T'HH:mm:ss to change for a while.Tomi
@Deathtrap it's not always an optionAntioch
@Antioch name just one comprehensible reason not to use a format like ISO 8601 for persistence.Mildamilde
@Mildamilde Who said this format was for persistence? It's parsed from a HTML page not a database. Do you know that dates are not always displayed in ISO format?Antioch
@Antioch an HTML page that you are trying to parse is persistent data. And obviously the wrong approach for what you are trying to do. Yes, “dates are not always displayed in ISO format”. But since you are trying to parse that page, that’s irrelevant.Mildamilde
@Mildamilde have to agree with Steven on this. There are obviously valid reasons to parse dates in all forms and they're not always normalized for a computer. Yes, scraping a website designed for humans is always going to be somewhat brittle, but that doesn't make it wrong or necessarily suboptimal. If you have no control over the source of your data then sometimes you have to deal with data that's presented in a way that's different than you'd ideally like. You can't just whine to the producer to change it. That's something most people learn in their first year being a professional developer.Deathtrap
@Deathtrap if you know what you are doing, you know that you can’t expect external data to match exactly the pattern provided by the locale implementation of your local system. Otherwise, you end up with a software that breaks when a date string contains “Sept” instead of “Sep” or vice versa.Mildamilde
@Mildamilde Yes, so the probable solution to that would be to specify a Locale which matches the source. Your parser will still be subject the source data changing format (when scraping something designed for humans, that's unavoidable), but at least your code is portable. Omitting a Locale is a common mistake that's easily made because the API doesn't rigidly enforce you to do so. The solution is not necessarily anything to do with changing the format (e.g. to ISO), because there are perfectly valid scenarios when you simply can't.Deathtrap
Well, obviously, specifying the locale wasn’t sufficient in the OP’s case. And you said yourself why such an approach isn’t sufficient.Mildamilde
@Mildamilde Suppose OP is in the UK where we use 'Sept', and the source website they are trying to scrape is American where they use 'Sep'. In that case, the bug is that they were relying on their default Locale for parsing which did not match the source Locale. What I was saying in that comment is that, because language is fluid, of course you can't expect a solution like this to work forever. But tasks like scraping websites are not solutions you should ever expect to work forever. They are inherently brittle. That does not invalidate them. Sometimes scraping a website is the best you can doDeathtrap
@Mildamilde Have you seriously never parsed anything with a computer that wasn't specifically designed to be parsed by a computer?Deathtrap
@Deathtrap you are saying “you can't expect a solution like this to work forever” and I’m saying “you can't even expect a solution like this to work a second time¹”. Not so much different in the context of an OP assuming that this worked forever. The problem is not that the OP had to parse data not designed to be parsed by a computer. The problem is the approach chosen for the task. — ¹ because what happened with updating to Java 17 could have happened with any other tiny change in the environment too (some Java implementations use the operating system’s locale data, for example).Mildamilde
@Mildamilde "The problem is not that the OP had to parse data not designed to be parsed by a computer" That is quite literally the problem. If you can't see that then we are just wasting each other's time. "The problem is the approach chosen for the task" Name a solution for parsing a human-readable string that doesn't suffer from the same issues. There isn't one, but I'll wait.Deathtrap
@Deathtrap this question and the accepted answer have the potential to help a lot of people in the UK as the adoption of Java 17 grows but Holger managed to turn the comment section into an irrelevant generic lecture on how to persist dates in a database which most people are already familiar with.Antioch
C
2

Aside from the arguments of whether parsing text (from external legacy sources) for date/times is a good thing, or whether standards should be allowed to evolve versus backward compatibility...

a practical fix is to switch Locale.UK to Locale.US, for parsing Sep 29, 1988 or 30-Sep-2020 etc.

Consignment answered 5/8, 2022 at 16:26 Comment(0)
T
0

There are other potential cases where you will face issues issues due to this like communicating between 2 versions of Java.

If your Java 17 Application is serializing an entity which has a Date field, and if your default locale is set to UK, the serialized data field in Pojo will be "DD Sept YYYY".

When this POJO is transferred over wire for consumption by another application build on lower version of Java, it will throw a java.text.ParseException exception.

Before publishing any data - Message or API request, you need to explicitly use a locale which does not convert your date fields to "Sept"

Teletypesetter answered 6/9, 2023 at 14:25 Comment(1)
It may seem like nitpicking, but I find its consequences important: When you mention Date and java.text.ParseException, it sounds like you are assuming SimpleDateFormat and/or DateFormat for formatting and parsing. The OP did not use them, and you should not either. They are notoriously troublesome classes and fortunately supplanted by java.time, the modern Java date and time API, a decade ago, so long outdated.Wilbertwilborn

© 2022 - 2024 — McMap. All rights reserved.