Transliteration from Cyrillic to Latin ICU4j java [duplicate]
Asked Answered
P

1

16

I need to do something rather simple but without hash mapping hard coding.

I have a String s and it is in Cyrillic I need some sort of example on how to turn it into Latin characters using a custom filter of a sort (to give a purely Latin example as to not confuse anyone if String s = sniff; I want it to look up s-n-i-f-f and change them into something else (there might also be combinations).

I can see that ICU4j can do this sort of thing but I have no idea how to achieve it as I can't find any working examples (or I'm just stupid).

Any help is appreciated.

Thanks

Best Regards,

PS I need a batch translate. I don't care about styles or dynamic transliteration just some basic example on what a ICU4j batch transliterator would look like.

K I actually got it.

import com.ibm.icu.text.Transliterator;


public class BulgarianToLatin {


    public static String BULGARIAN_TO_LATIN = "Bulgarian-Latin/BGN";

    public static void main(String[] args) {
        String bgString = "Джокович";

        Transliterator bulgarianToLatin = Transliterator.getInstance(BULGARIAN_TO_LATIN);
        String result1 = bulgarianToLatin.transliterate(bgString);
        System.out.println("Bulgarian to Latin:" + result1);

    }

}

Also one last edit for a rule based transliteration ( if you do not wish to use the pre-existing once or just want something custom made )

import com.ibm.icu.text.Transliterator;

public class BulgarianToLatin {


    public static String BULGARIAN_TO_LATIN = "Bulgarian-Latin/BGN";

    public static void main(String[] args) {
        String bgString = "а б в г д е ж з и й к л м н о п р с т у ф х ц ч ш щ ю я  \n Юлиян Джокович";

        String rules="::[А-ЪЬЮ-ъьюяѢѣѪѫ];" +
        "Б > B;" +
        "б > b;" +
        "В > V;" +
        "ТС > TS;" +
        "Тс > Ts;" +
        "ч > ch;" +
        "ШТ > SHT;" +
        "Шт > Sht;" +
        "шт > sht;" +
        "{Ш}[[б-джзй-нп-тф-щь][аеиоуъюяѣѫ]] > Sh;" +
        "Я > YA;" +
        "я > ya;";
        Transliterator bulgarianToLatin = Transliterator.createFromRules("temp", rules, Transliterator.FORWARD);

        String result1 = bulgarianToLatin.transliterate(bgString);
        System.out.println("Bulgarian to Latin:" + result1);

    }

}
Pricefixing answered 29/4, 2013 at 7:28 Comment(0)
N
26

I've wrote a method to transliterate cyrillic to latin, maybe this would be useful to smb.

public static String transliterate(String message){
    char[] abcCyr =   {' ','а','б','в','г','д','е','ё', 'ж','з','и','й','к','л','м','н','о','п','р','с','т','у','ф','х', 'ц','ч', 'ш','щ','ъ','ы','ь','э', 'ю','я','А','Б','В','Г','Д','Е','Ё', 'Ж','З','И','Й','К','Л','М','Н','О','П','Р','С','Т','У','Ф','Х', 'Ц', 'Ч','Ш', 'Щ','Ъ','Ы','Ь','Э','Ю','Я','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
    String[] abcLat = {" ","a","b","v","g","d","e","e","zh","z","i","y","k","l","m","n","o","p","r","s","t","u","f","h","ts","ch","sh","sch", "","i", "","e","ju","ja","A","B","V","G","D","E","E","Zh","Z","I","Y","K","L","M","N","O","P","R","S","T","U","F","H","Ts","Ch","Sh","Sch", "","I", "","E","Ju","Ja","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"};
    StringBuilder builder = new StringBuilder();
    for (int i = 0; i < message.length(); i++) {
        for (int x = 0; x < abcCyr.length; x++ ) {
            if (message.charAt(i) == abcCyr[x]) {
                builder.append(abcLat[x]);
            }
        }
    }
    return builder.toString();
}
Nevile answered 31/3, 2015 at 8:32 Comment(3)
Much useful for simple applications. Thanks!Archuleta
You have a typo in your 'abcCyr' array, instead of 'Ь' you have written 'Б'.Eyebrow
You can break once you found a match. Otherwise you do a lot of unnecessary comparisons. HashMap should offer a better performance than iterating over the same array over and over, especially for longer strings, but if you want to stay with arrays, you don't need to repeat latin characters in both arrays. Simply copy original character into a StringBuilder if no match is found.Beardless

© 2022 - 2024 — McMap. All rights reserved.