Wrong sorting with Collator using Locale.SIMPLIFIED_CHINESE

I'm trying to order a list of countries in Chinese using Locale.SIMPLIFIED_CHINESE, which seems that it orders using pinyin (phonetic alphabet, that is characters are ordered according to their latin correspondent combination, from A to Z).

But I've found some cases when it orders bad. For example:

'中' character is zhong1
'梵' character is fan4

The correct order should be 梵 < 中, but instead it is ordered in the other way.

String[] characters = new String[] {"梵", "中"};
List<String> list = Arrays.asList(characters);
System.out.println("Before sorting...");
System.out.println(list.toString());

Collator collator = Collator.getInstance(Locale.SIMPLIFIED_CHINESE);
collator.setStrength(Collator.PRIMARY);
Collections.sort(list, collator);

System.out.println("After sorting...");
System.out.println(list.toString());

Results of this snippet are:

Before sorting...
[梵, 中]
After sorting...
[中, 梵]

Going deeper, I found the rules that Java applies with Locale.SIMPLIFIED_CHINESE. You can find in next image: https://postimg.cc/image/4t915a7gp/full/ (Notice that 梵 is after 中)

I realized before the <口<口<口<口<口 that I highlighted in red, all characters are ordered according to their latin correspondent combination, from A to Z. However, after the <口<口<口<口<口 sign, the characters are ordered by the composition of the character. For example, if all the characters have a same part (usually the left part of the character), they are then grouped together, not according to the A to Z rule.

Also, all the characters after the <口<口<口<口<口 are less common Chinese characters. So, 梵 is a less common character than 中, so it is put after <口<口<口<口<口.

I wonder why this decision, if it is intentionally. But it results in wrong sortings. I don't know how to find a solution for this.

// the unicode character and the number of strokes String[] characters = new String[]{ "\u68B5 (11)", "\u4E2D (4)", "\u5207 (4)", "\u5973 (3)", "\u898B (7)" }; List<String> list = Arrays.asList(characters); System.out.println("Before sorting..."); System.out.println(list.toString()); Collator collator = Collator.getInstance(Locale.TRADITIONAL_CHINESE); collator.setStrength(Collator.PRIMARY); System.out.println(); Collections.sort(list, collator); System.out.println("After sorting..."); System.out.println(list.toString());

Recommended topics

Hot tags