Here is how i managed to sort Persian language correctly (without PyICU)(using python 3.x):
First set the locale (don't forget to import locale and platform)
if platform.system() == 'Linux':
locale.setlocale(locale.LC_ALL, 'fa_IR.UTF-8')
elif platform.system() == 'Windows':
locale.setlocale(locale.LC_ALL, 'Persian_Iran.1256')
else:
pass (or any other OS)
Then sort using key:
a = ['ا','ب','پ','ت','ث','ج','چ','ح','خ','د','ذ','ر','ز','ژ','س','ش','ص','ض','ط','ظ','ع','غ','ف','ق','ک','گ','ل','م','ن','و','ه','ي']
print(sorted(a,key=locale.strxfrm))
For list of Objects:
a = [{'id':"ا"},{'id':"ب"},{'id':"پ"},{'id':"ت"},{'id':"ث"},{'id':"ج"},{'id':"چ"},{'id':"ح"},{'id':"خ"},{'id':"د"},{'id':"ذ"},{'id':"ر"},{'id':"ز"},{'id':"ژ"},{'id':"س"},{'id':"ش"},{'id':"ص"},{'id':"ض"},{'id':"ط"},{'id':"ظ"},{'id':"ع"},{'id':"غ"},{'id':"ف"},{'id':"ق"},{'id':"ک"},{'id':"گ"},{'id':"ل"},{'id':"م"},{'id':"ن"},{'id':"و"},{'id':"ه"},{'id':"ي"}]
print(sorted(a, key=lambda x: locale.strxfrm(x['id']))
Finally you can return the locale:
locale.setlocale(locale.LC_ALL, '')
locale.getlocale(LC_COLLATE)
return after your setlocale line? – Prodromelocale
module uses the locale API from the C library, so if there is an error it must be in the C library. An equivalent test with localede_DE.UTF-8
and stringä
instead ofą
works correctly. Even if I use the German locale withą
the order is correct, so there must be something wrong with the Polish locale implementation in the C library. As a workaround you can convert the string to normalization form D usingunicodedata.normalize
, then even the naivestrcmp
ordering should work. – Clausenpl_PL.UTF-8
andde_DE.UTF-8
, and also withsort(key=locale.strxfrm)
instead of usingstrcoll
also on OS X and for the moment am getting your incorrect result. Stingä
with de_DE.UTF8 did not work for me. – Bisulcate