I'm trying to order a list of countries in Chinese using Locale.SIMPLIFIED_CHINESE, which seems that it orders using pinyin (phonetic alphabet, that is characters are ordered according to their latin correspondent combination, from A to Z).
But I've found some cases when it orders bad. For example:
- '中' character is zhong1
- '梵' character is fan4
The correct order should be 梵 < 中, but instead it is ordered in the other way.
String[] characters = new String[] {"梵", "中"};
List<String> list = Arrays.asList(characters);
System.out.println("Before sorting...");
System.out.println(list.toString());
Collator collator = Collator.getInstance(Locale.SIMPLIFIED_CHINESE);
collator.setStrength(Collator.PRIMARY);
Collections.sort(list, collator);
System.out.println("After sorting...");
System.out.println(list.toString());
Results of this snippet are:
Before sorting...
[梵, 中]
After sorting...
[中, 梵]
Going deeper, I found the rules that Java applies with Locale.SIMPLIFIED_CHINESE. You can find in next image: https://postimg.cc/image/4t915a7gp/full/ (Notice that 梵 is after 中)
I realized before the <口<口<口<口<口 that I highlighted in red, all characters are ordered according to their latin correspondent combination, from A to Z. However, after the <口<口<口<口<口 sign, the characters are ordered by the composition of the character. For example, if all the characters have a same part (usually the left part of the character), they are then grouped together, not according to the A to Z rule.
Also, all the characters after the <口<口<口<口<口 are less common Chinese characters. So, 梵 is a less common character than 中, so it is put after <口<口<口<口<口.
I wonder why this decision, if it is intentionally. But it results in wrong sortings. I don't know how to find a solution for this.