How to use unsupported Locale in Java 11 and numbers in String.format()
Asked Answered
D

2

5

How can I use an unsupported Locale (eg. ar-US) in JAVA 11 when I output a number via String.format()?

In Java 8 this worked just fine (try jdoodle, select JDK 1.8.0_66):

Locale locale = Locale.forLanguageTag("ar-US");
System.out.println(String.format(locale, "Output: %d", 120));
// Output: 120

Since Java 11 the output is in Eastern Arabic numerals (try jdoodle, use default JDK 11.0.4):

Locale locale = Locale.forLanguageTag("ar-US");
System.out.println(String.format(locale, "Output: %d", 120));
// Output: ١٢٠

It seems, this problem comes from the switch in the Locale Data Providers form JRE to CLDR (source: Localization Changes in Java 9 by @mcarth). Here is a list of supported locales: JDK 11 Supported Locales

UPDATE

I updated the questions example to ar-US, as my example before didn't make sense. The idea is to have a format which makes sense in that given country. In the example it would be the United States (US).

Deme answered 9/12, 2020 at 14:6 Comment(4)
What are you trying to achieve? What does ar-EN mean? EN is not a registered country code. What locale do you want to describe?Strongroom
Could you describe better what you mean with "use an unsupported Locale"? @rzwitserloot has described what is going on, but to solve your problem we need to know what you actually want to achieve.Bouffard
@JoachimSauer by unsupported Locales I meant not contain in the "Supported Locale" list (link).Deme
Others have mentioned you can use ar-u-nu-latn. This page gives the full explanation of those unicode extension tags: en.wikipedia.org/wiki/IETF_language_tagDorotheadorothee
B
6

The behavior conforms to the CLDR being treated as the preferred Locale. To confirm this, the same snippet in Java-8 could be executed with

-Djava.locale.providers=CLDR

If you step back to look at the JEP 252: Use CLDR Locale Data by Default, the details follow :

The default lookup order will be CLDR, COMPAT, SPI, where COMPAT designates the JRE's locale data in JDK 9. If a particular provider cannot offer the requested locale data, the search will proceed to the next provider in order.

So, in short if you really don't want the default behaviour to be that of Java-11, you can change the order of lookup with the VM argument

-Djava.locale.providers=COMPAT,CLDR,SPI

What might help further is understanding more about picking the right language using CLDR!

Bermuda answered 9/12, 2020 at 16:40 Comment(7)
as a side note, I could confirm the behavior on my local, and doesn't seem like jdoodle would let you choose a VM arg.Bermuda
-Djava.locale.providers=COMPAT is our solution at the moment, but it's not a long term solution.Deme
@Bermuda Glad that is a useful resource.Changeling
@Deme I know this is old, but it's still not clear what you are trying to accomplish. If ar-US is not producing the right result then it is a CLDR data issue.Changeling
@StevenR.Loomis if I remembered right, the issue resulted in the switch from JAVA 8 to 11 (see jdoodle and switch between the java versions before execute). We used local to have differenced translation depending on each supported country in the app. Despite that, we had a set of languages which was supported in each of these countries.Deme
@Deme thanks for answering… the problem is still not clear to me. ١٢٠ is correct for "120" using Arabic digits. If you use ar_DZ it should format as 120 due to local preference. There’s some discussion about what the default should be (such as for ar_US where there is not currently data) but at present the default is Arabic digits. Is your concern whether ١٢٠ is right for US specifically or something else?Thanks!Changeling
@StevenR.Loomis if I remembered it right, our main concern was, that with the switch of the local provider in java 11 our old source code didn't work anymore as we wanted to display the numbers as Arabic digits.Deme
C
2

I'm sure I'm missing some nuance, but the problem is with your tag, so fix that. Specifically:

ar-EN makes no sense. That's short for:

language = arabic
country = ?? nobody knows.

EN is not a country. en is certainly a language code (for english), but the second part in a language tag is for country, and EN is not a country. (for context, there is en-GB for british english and en-US for american english).

Thus, this is as good as ar (as in, language = arabic, not tied to any particular country). Even if you did tie it to some country, that is mostly immaterial here; that would affect things like 'what is the first day of the week' ,'which currency symbol is to be presumed' and 'should temperatures be stated in Kelvin or Fahrenheit' perhaps. It has no bearing on how to show digits, because that's all based on language.

And language is arabic, thus, ١٢٠ is what you get when you try ar as a language tag when printing the number 120. The problem is that you expect this to return "120" which is a bizarre wish1, combined with the fact that java, unfortunately, shipped with a bug for a long long time that made it act in this bizarre fashion, thinking that rendering the number 120 in arabic is best done with "120", which is wrong.

So, with that context, in order of preference:

Best solution

Find out why your system ends up with ar-EN and nevertheless expects '120', and fix this. Also fix ar-EN in general; EN is not a country.

More generally, 'unsupported locale' isn't really a thing. the ar part is supported, and it's the only relevant part of the tag for rendering digits.

Alternatives

The most likely best answer if the above is not possible is to explicitly work around it. Detect the tag yourself, and write code that will just respond with the result of formatting this number using Locale.ENGLISH instead, guaranteeing that you get Output: 120. The rest seems considerably worse: You could try to write a localization provider which is a ton of work, or you can try to tell java to use the JRE version of the provider, but that one is obsoleted and will not be updated, so you're kicking the can down the road and setting yourself up for a maintenance burden later.

1.) Given that the JRE variant actually printed 120, and you're also indicating you want this, I get that nagging feeling I'm missing some political or historical info and the expectation that ar-EN results in rendering the number 120 as "120" is not so crazy. I'd love to hear that story if you care to provide it!

Coccidiosis answered 9/12, 2020 at 14:28 Comment(11)
Sorry about the bad example, I updated the question to ar-US.Deme
@Deme try this: Locale.forLanguageTag("ar-US-u-nu-latn").Coccidiosis
Try a supported locale like ar-DZ (Algeria)... that works like I suspect it: jdoodle.com/ia/5xt I just wonder why the default is now Easter Arabic.Deme
@Deme well, yeah, that's the switch from JRE to CLDR. The only real 'fix' you can make without changing your code / tag is to change the provider, but that's a dead end (the JRE list is not being updated anymore).Coccidiosis
how is the language tag a problem here? I think the update to the question really defies any hints in that direction. besides even when you use the tag for ar-EN, the code executes as stated by the OP in the question.Bermuda
Minor point: there is en-UK for british english - should that be en-GB instead?Hettie
@andrewjames yeah. That teaches me to trust 'top of my head' a little less :) I'll edit.Coccidiosis
@Coccidiosis I still don't understand why ar-US-u-nu-latn works and ar-US not? I'm missing the logical explanation.Deme
In the CLDR model, the job 'render this numeric value into a string' is fundamentally an aspect of language and has zip squat to do with country (contrast to 'is the first day of the week sunday or monday?' which is 100% based on country and not language). That explains why the -US part does not, and cannot, be used to change that aspect of what locale settings Do. So, just ar is relevant, but that is insufficient as 'arabic' without further context has multiple digit systems. Therefore, just ar is insufficient, we need a 'modification' system. And language tags do have that...Coccidiosis
The -u-nu-latn part is: "... but modified (-u) to use the number system (-nu) of hindu-arabic numerals (-latn)." Exactly how latn ended up being the shortcode for this is anyone's guess.Coccidiosis
@Coccidiosis not quite true. the region (some of which are countries) does affect numerics, such as the preferred numbering system as in this example. It also can affect decimal separators etc. So it is best to be as specific as possibleChangeling

© 2022 - 2024 — McMap. All rights reserved.