Possible Duplicates:
Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars
Is there a way to get rid of accents and convert a whole string to regular letters?
How can i do this? Thanks for the help
Possible Duplicates:
Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars
Is there a way to get rid of accents and convert a whole string to regular letters?
How can i do this? Thanks for the help
I think your question is the same as these:
and hence the answer is also the same:
String convertedString =
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
See
Example Code:
final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ";
System.out.println(
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "")
);
Output:
This is a funky String
ä
-> a + ¨
and then removes all non ASCII characters. But Danish (as well as other languages) doesn't have compulsory diacritics, meaning that letters like ø
and å
aren't e.g a + °
but instead are their own letter and thus own code point, meaning the entire letter is non ASCII and thus gets removed. –
Hornet You can use java.text.Normalizer
to separate base letters and diacritics, then remove the latter via a regexp:
public static String stripDiacriticas(String s) {
return Normalizer.normalize(s, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
First - you shouldn't. These symbols carry special phonetic properties which should not be ignored.
The way to convert them is to create a Map
that holds each pair:
Map<Character, Character> map = new HashMap<Character, Character>();
map.put('á', 'a');
map.put('é', 'e');
//etc..
and then loop the chars in the string, creating a new string by calling map.get(currentChar)
© 2022 - 2024 — McMap. All rights reserved.