Java internationalization (i18n) with proper plurals
Asked Answered
C

2

38

I was going to use Java's standard i18n system with the ChoiceFormat class for plurals, but then realized that it doesn't handle the complex plural rules of some languages (e.g. Polish). If it only handles languages that resemble English, then it seems a little pointless.

What options are there to achieve correct plural forms? What are the pros and cons of using them?

Coastland answered 14/1, 2013 at 21:6 Comment(8)
Could you give an example of the complex plural rules that resource bundles don't handle?Bangup
Resource bundles have nothing to do with handling plurals. Aren't you confusing for example the MessageFormat API with ResourceBundle API?Mess
I do not know any tool to properly handle conjugation, if this is what you've meant by "plurals". One workaround is to use multiple keys for different versions of the whole printed message. If the message is not fixed (i.e. formatting dates) then you'd need to create a locale sensitive formatter I guess... The issue is very complex and I would love to find a library that facilitates this problem's resolution.Palatal
Conjugation is also an interesting case. Something for another question? But I mean simply plural forms. Mozilla has a good overview of different plural rules and which languages use each set of rules at developer.mozilla.org/en-US/docs/Localization_and_PluralsCoastland
@BalusC: yes, ok, I mean ChoiceFormat which seems to be the builtin way to handle plural forms.Coastland
@mbaumbach: here's some documentation of plurals from the Qt library with examples (Polish) that Java's ChoiceFormat won't handle: doc.qt.digia.com/qq/qq19-plurals.htmlCoastland
@BalusC: I don't agree with removing the internationalization tag and the change of title. This isn't about how to make ChoiceFormat handle plurals. It can't. It's about finding an alternative for Java internationalization that actually works.Coastland
Some languages use conjugation to create plural forms. Actually, all languages with proper conjugation, I think...Bloch
T
46

Well, you already tagged the question correctly, so I assume you know thing or two about ICU.

With ICU you have two choices for proper handling of plural forms:

  • PluralRules, which gives you the rules for given Locale
  • PluralFormat, which uses aforementioned rules to allow formatting

Which one to use? Personally, I prefer to use PluralRules directly, to select appropriate message from the resource bundles.

ULocale uLocale = ULocale.forLanguageTag("pl-PL");
ResourceBundle resources = ResourceBundle.getBundle( "path.to.messages",
                               uLocale.toLocale());
PluralRules pluralRules = PluralRules.forLocale(uLocale);

double[] numbers = { 0, 1, 1.5, 2, 2.5, 3, 4, 5, 5.5, 11, 12, 23 };
for (double number : numbers) { 
  String resourceKey = "some.message.plural_form." + pluralRules.select(number);
  String message = "!" + resourceKey + "!";
  try {
    message = resources.getString(resourceKey);
    System.out.println(format(message, uLocale, number));
   } catch (MissingResourceException e) { // Log this } 
}

Of course you (or the translator) would need to add the proper forms to properties file, in this example let's say:

some.message.plural_form.one=Znaleziono {0} plik
some.message.plural_form.few=Znaleziono {0} pliki
some.message.plural_form.many=Znaleziono {0} plików
some.message.plural_form.other=Znaleziono {0} pliku

For other languages (i.e. Arabic) you might also need to use "zero" and "two" keywords, see CLDR's language plural rules for details.

Alternatively you can use PluralFormat to select valid form. Usual examples show direct instantiation, which totally doesn't make sense in my opinion. It is easier to use it with ICU's MessageFormat:

String pattern = "Znaleziono {0,plural,one{# plik}" +
                 "few{# pliki}" +
                 "many{# plików}" +
                 "other{# pliku}}";
MessageFormat fmt = new MessageFormat(pattern, ULocale.forLanguageTag("pl-PL"));
StringBuffer result = new StringBuffer();
FieldPosition zero = new FieldPosition(0);
double[] theNumber = { number };
fmt.format(theNumber, result, zero);

Of course, realistically you would not hardcode th pattern string, but place something like this in the properties file:

some.message.pattern=Found {0,plural,one{# file}other{# files}}

The only problem with this approach is, the translator must be aware of the placeholder format. Another issue, which I tried to show in the code above is, MessageFormat's static format() method (the one that is easy to use) always formats for the default Locale. This might be a real problem in web applications, where the default Locale typically means the server's one. Thus I had to format for a specific Locale (floating point numbers, mind you) and the code looks rather ugly...

I still prefer the PluralRules approach, which to me is much cleaner (although it needs to use the same message formatting style, only wrapped with helper method).

Tacy answered 14/1, 2013 at 22:10 Comment(2)
Thanks, lots of good info. No, I don't know ICU and gettext, I only read that they have better support for plural forms. I also wonder how they compare, if you have any experience with gettext. Perhaps ICU has an advantage as you are using resource bundles, which may work better with standard Java tools.Coastland
@Dr.Haribo: This really depends how are you going to process the translations. Depending on your Translation Memory tool (if any), gettext might be better or worse solution. I'd consult translation provider first.Caustic
R
5

ChoiceFormat, as explained here seems flexible enough to deal with any sort of pluralization you might throw at it.

EDIT: as Dr.Haribo pointed out in his comment, ChoiceFormat is not sufficient for Polish pluralization. But a followup from the same blog suggests ICU4J that handles more complex pluralization rules

Rover answered 14/1, 2013 at 21:21 Comment(3)
Look in the comments of the post you linked to. There's an example from Polish that shows that ChoiceFormat doesn't cut it. There's a followup post at stuartgunter.wordpress.com/2011/08/14/… that shows how to fix this using ICU4J.Coastland
@Peter: ChoiceFormat won't let you correctly handle floating point numbers (the fraction part) as well as repeated rules (with modulo arithmetics). I'm sorry to say that, but ChoiceFormat is useless for Polish or similar languages (and I really know what I am talking about).Caustic
duly noted, I am not an expert in the Polish language, and should have known this seemed too simple. I added the link to the followup post to my answer to make it clearer that ChoiceFormat alone is not enoughRover

© 2022 - 2024 — McMap. All rights reserved.