Customize a Locale in Java
Asked Answered
T

1

9

In Java the Locale defines things that are related how people want to see things (like currency formats, the name of the months and when a week starts).

When parsing the name of a Month (with a DateTimeFormatter) it starts to become tricky.

If you use Locale.US or Locale.ENGLISH then September has the short form Sep.

If you use Locale.UK then September also has the short form Sep in Java 11 ... but when you try Java 17 then it has Sept (because of changes at the Unicode CLDR end for which I asked if this was correct).

The effect is that my tests started failing when trying to build with Java 17.

The reason my current code uses Locale.UK instead of Locale.ENGLISH is because in Java Locale.ENGLISH is actually not just English but also the non-ISO American way of defining a week (they use Sunday as the first day of the week). I want to have it the ISO way.

Simply:

  • WeekFields.ISO = WeekFields.of(Locale.UK) = WeekFields[MONDAY,4]
  • WeekFields.of(Locale.ENGLISH) = WeekFields.of(Locale.US) = WeekFields[SUNDAY,1]

So starting with Java 17 I have not yet been able to find a built in Locale that works correctly.

In my mind I have to take either the Locale.ENGLISH and change the WeekFields or take the Locale.UK and change the shortname of the month September to what I need.

My question is how do I do this (in Java 17)?

Or is there a better way to fix this?


Update 1:

  • I already got feedback from the people at Unicode indicating that the change for en_GB to use Sept instead of Sep is a bugfix because that is the way it should be abbreviated in the UK.

So it seems I will need not just a parser that accepts "Sep" but one that will accept a mix of "Sept" and "Sep" for English.

Update 2:

  • I have tweaked my code that in case of a parse exception it will try to change what is assumed to be the input ("Sep") into what the currently selected locate likes to have. This does not cover all cases, it covers enough cases for my specific situation. For those interested: my commit.
Thunderhead answered 31/1, 2022 at 15:49 Comment(8)
I am surprised that en_GB and en_AU would be making such a change in the CLDR after all these years. Please update your Question if you learn more about this change.Reinert
#65218874 You can just override the lookup of locale data to "COMPAT" -Djava.locale.providers=COMPAT, which will keep your data consistent with Java 9, regardless of future changes. By the way, this explains why the short form "Sep" has changed, though not how to override it.Selfregulating
"for which I asked if this was correct". I strongly suspect the answer to this will be yes. I'm a native brit and the intuitive short form of September here is 'Sept'. "Sep 7th", while sufficiently clear, looks wrong.Selfregulating
Ultimately, your problem is that you are trying to parse something for which there is no objective standard. One day, the name of the 9th month might be "September", and the next day our benevolent ruler might decide it should be changed to "Johnsonber". Either use a format which is specifically designed for data transfer (e.g. ISO), or live with the fact that language is fluid and you'll never be able to capture that with code that's static.Selfregulating
I’m not sure I understand the problem here. As you found out, Java 17 changed the textual abbreviation of the month September for one or more locales. There’s no guarantee made in the Javadocs that the text will remain the same for eternity. If you want your code to work with Java 17, you’ll need to update the test; what’s the complication here?Cronyism
@AbhijitSarkar The problem here is that my software ( github.com/nielsbasjes/logparser ) is a parser for Apache/Nginx access logfiles which are written in a certain way. This means that it should remain possible to read older logfiles which will contain 'old' dates (like "Sep") and simply updating my test to expect "Sept" would ensure the software no longer works for actual logfiles that need to be parsed.Thunderhead
I can't provide an answer, but I can say that names in date strings should be used exclusively for displaying them to users and never for machine-to-machine communication, exactly for reasons like you demonstrated here. This doesn't help in your case, as you have no control over the input format, but that just means you'll have to deal with issues like that. CLDR data changes all the time. Sometimes to fix bugs and sometimes because language use has changed. Depending on its unchanging nature is a design flaw (that yes, we sometimes have to work around).Stapes
The locale you use for parsing has no influence on how you use WeekFields afterwards, if you use it at all. So why do you make your life unnecessarily hard by using Locale.UK when you actually mean Locale.ENGLISH or Locale.ROOT?Sesquipedalian
T
4

I found a way of handling this by using SPI.

I'm documenting it here as a possibility that may work for others (it does not work for my context).

As an experiment I created a class:

package nl.basjes.parse.httpdlog.dissectors.locale;

import java.util.Locale;
import java.util.spi.CalendarDataProvider;

import static java.util.Calendar.MONDAY;

public class CalendarDataProviderISO8601 extends CalendarDataProvider {
    public static final Locale ENGLISH_ISO = new Locale("en", "", "ISO");

    @Override 
    public int getFirstDayOfWeek(Locale locale) {
        return MONDAY; 
    }

    @Override
    public int getMinimalDaysInFirstWeek(Locale locale) { 
        return 4; 
    }

    @Override
    public Locale[] getAvailableLocales() {
        return new Locale[]{ENGLISH_ISO}; 
    }
}

and a file ./src/main/resources/META-INF/services/java.util.spi.CalendarDataProvider with

nl.basjes.parse.httpdlog.dissectors.locale.CalendarDataProviderISO8601

Because this is just a variant over the regionless "English" it will take everything from "English" and put the above class over it.

Although this works I cannot use it.

The problem is that although http://openjdk.java.net/jeps/252 describes The default lookup order will be CLDR, COMPAT, SPI, the current reality is that the SPI has been removed from this list in this change because of deprecating the Extension Mechanism.

So to use this construct the class must be in the classpath at startup and the commandline option -Djava.locale.providers=CLDR,COMPAT,SPI must be passed to the JVM.

Given that my library ( https://github.com/nielsbasjes/logparser/ ) is also used in situations (like Apache Flink/Beam/Drill/Pig) where classes are shipped in a more dynamic way (serialized and transported to an already running JVM) to multiple machines this construct cannot be used.

I currently do not know of a dynamic way of doing something like this in Java.

Thunderhead answered 1/2, 2022 at 9:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.