Converting Java String to ascii
Asked Answered
D

2

24

I need to convert Strings that consists of some letters specific to certain languages (like HÄSTDJUR - note Ä) to a String without those special letters (in this case HASTDJUR). How can I do it in Java? Thanks for help!


It is not really about how it sounds. The scenario is following - you want to use the application, but don't have the Swedish keyboard. So instead of looking at the character map, you type it by replacing special letters with the typical letters from the latin alphabet.

Doronicum answered 14/9, 2010 at 10:23 Comment(4)
HASTDJUR? Germans would expect HAESTDJUR. You seem to assume some particular rules, can you state them explicitly ?Ealing
A few more cases for you to ponder over: IJ => IJ ? Æ => AE ? DŽ => DZ ? ß => ss ? Ʀ => R ? ð => ? Δ => D ?Ealing
@Ealing Once you see Haemaelaeinen written somewhere, you don't want to convert ä to ae any more...Watchtower
Well, it is Swedish so I know what to expect :)Doronicum
G
62

I think your question is the same as this one:

Java - getting rid of accents and converting them to regular letters

and hence the answer is also the same:

Solution

String convertedString = 
       Normalizer
           .normalize(input, Normalizer.Form.NFD)
           .replaceAll("[^\\p{ASCII}]", "");

References

See

Example Code:

final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ";
System.out.println(
    Normalizer
        .normalize(input, Normalizer.Form.NFD)
        .replaceAll("[^\\p{ASCII}]", "")
);

Output:

This is a funky String

Gamopetalous answered 14/9, 2010 at 10:37 Comment(3)
seanizer - I need to test it but seems to be the solution.Doronicum
This does not appear to deal with composite characters very well (Æ, Œ).Dowel
@WeckarE. for ligatures, an additional step is required, which is outlined here: lexsrv3.nlm.nih.gov/LexSysGroup/Projects/lvg/2013/docs/… (End of Page)Gamopetalous
Z
1

I'd suggest a mapping, of special characters, to the ones you want.

Ä --> A
é --> e
A --> A (exactly the same)
etc...

And then you can just call your mapping over your text (in pseudocode):

for letter in string:
   newString += map(letter)

Effectively, you need to create a set of rules for what character maps to the ASCII equivalent.

Zomba answered 14/9, 2010 at 10:26 Comment(7)
I am unfortunate and don't know whether Ä sounds like A or something else. :)Krongold
Who said anything about sounds like? This question seems to be just about removing the decorations on the letters, to put it crudely.Zomba
May be not. I couldn't infer that from the question. Are you going on example provided? See the comments on the question, to know what I mean.Krongold
How would you create such a table, and how would you effectively use it?Ealing
@MSalters: That's another question. Can be done with some predefined rules, I suppose.Krongold
@Ealing This is just one way. There are probably much better ways (1) create Map<Character,Character>table=new HashMap<Character,Character>(); table.put('Ä','A');.... (2) use Character unicode ; ... Character ascii=table.get(unicode) ;Venery
It is not really about how it sounds. The scenario is following - you want to use the application, but don't have the Swedish keyboard. So instead of looking at the character map, you type it by replacing special letters with the typical letters from the latin alphabet.Doronicum

© 2022 - 2024 — McMap. All rights reserved.