Change foreign characters to their roman equivalent
Asked Answered



I am using php and I was wondering if there was a predefined way to convert foreign characters to their non-foreign alternatives.

Characters such as ê, ë, é all resulting to 'e'.
I'm looking for a function that would take a string and return it without the special characters.
Any ideas would be greatly appreciated!

Houseline answered 26/7, 2011 at 22:1 Comment(9)
why would you want to? ê !=e, you would render it meaningless, why not support all languages?Balata
+1 "why would you want to?" Moreover, these are not characters are not "foreign", they're just accented.Aerostatics
I understand your concern but it is for machine readable only text. Not all "foreign" characters have accents (Ç, ç Œ,œ ß).Houseline
If this text is in a database and you cant to "convert" the characters for the purpose of comparison, you don't need to. Just use a collation where they are considered equal.Screamer
what are you defining as machine readable?Balata
Why are you drilling me @dagon I asked a question, whats it matter what its for?Houseline
@ThomasReggi, im not psychic, the answer depends on the usage, i can think of several definition for "machine readable" in this context, each having there own solution.Balata
Well, in order to give you a proper answer, it's often helpful to know not only what you are trying to achieve but also why you want to do it.Aerostatics
I have one use case for this currently - but it's not about machine readability. We have OCR system that often make mistakes about language-specific characters (like ń, ó, ł etc.). It often removes accent, but sometimes adds wrong accent. We need to search and compare strings from OCR - so it's more convenient to map those to ASCII to normalize it first.Harlamert

After failing to find suitable convertors I created my own collection that suits my needs including my favorite Cyrillic conversion that by default has numerous variations.

function transliterateString($txt) {
    $transliterationTable = array('á' => 'a', 'Á' => 'A', 'à' => 'a', 'À' => 'A', 'ă' => 'a', 'Ă' => 'A', 'â' => 'a', 'Â' => 'A', 'å' => 'a', 'Å' => 'A', 'ã' => 'a', 'Ã' => 'A', 'ą' => 'a', 'Ą' => 'A', 'ā' => 'a', 'Ā' => 'A', 'ä' => 'ae', 'Ä' => 'AE', 'æ' => 'ae', 'Æ' => 'AE', 'ḃ' => 'b', 'Ḃ' => 'B', 'ć' => 'c', 'Ć' => 'C', 'ĉ' => 'c', 'Ĉ' => 'C', 'č' => 'c', 'Č' => 'C', 'ċ' => 'c', 'Ċ' => 'C', 'ç' => 'c', 'Ç' => 'C', 'ď' => 'd', 'Ď' => 'D', 'ḋ' => 'd', 'Ḋ' => 'D', 'đ' => 'd', 'Đ' => 'D', 'ð' => 'dh', 'Ð' => 'Dh', 'é' => 'e', 'É' => 'E', 'è' => 'e', 'È' => 'E', 'ĕ' => 'e', 'Ĕ' => 'E', 'ê' => 'e', 'Ê' => 'E', 'ě' => 'e', 'Ě' => 'E', 'ë' => 'e', 'Ë' => 'E', 'ė' => 'e', 'Ė' => 'E', 'ę' => 'e', 'Ę' => 'E', 'ē' => 'e', 'Ē' => 'E', 'ḟ' => 'f', 'Ḟ' => 'F', 'ƒ' => 'f', 'Ƒ' => 'F', 'ğ' => 'g', 'Ğ' => 'G', 'ĝ' => 'g', 'Ĝ' => 'G', 'ġ' => 'g', 'Ġ' => 'G', 'ģ' => 'g', 'Ģ' => 'G', 'ĥ' => 'h', 'Ĥ' => 'H', 'ħ' => 'h', 'Ħ' => 'H', 'í' => 'i', 'Í' => 'I', 'ì' => 'i', 'Ì' => 'I', 'î' => 'i', 'Î' => 'I', 'ï' => 'i', 'Ï' => 'I', 'ĩ' => 'i', 'Ĩ' => 'I', 'į' => 'i', 'Į' => 'I', 'ī' => 'i', 'Ī' => 'I', 'ĵ' => 'j', 'Ĵ' => 'J', 'ķ' => 'k', 'Ķ' => 'K', 'ĺ' => 'l', 'Ĺ' => 'L', 'ľ' => 'l', 'Ľ' => 'L', 'ļ' => 'l', 'Ļ' => 'L', 'ł' => 'l', 'Ł' => 'L', 'ṁ' => 'm', 'Ṁ' => 'M', 'ń' => 'n', 'Ń' => 'N', 'ň' => 'n', 'Ň' => 'N', 'ñ' => 'n', 'Ñ' => 'N', 'ņ' => 'n', 'Ņ' => 'N', 'ó' => 'o', 'Ó' => 'O', 'ò' => 'o', 'Ò' => 'O', 'ô' => 'o', 'Ô' => 'O', 'ő' => 'o', 'Ő' => 'O', 'õ' => 'o', 'Õ' => 'O', 'ø' => 'oe', 'Ø' => 'OE', 'ō' => 'o', 'Ō' => 'O', 'ơ' => 'o', 'Ơ' => 'O', 'ö' => 'oe', 'Ö' => 'OE', 'ṗ' => 'p', 'Ṗ' => 'P', 'ŕ' => 'r', 'Ŕ' => 'R', 'ř' => 'r', 'Ř' => 'R', 'ŗ' => 'r', 'Ŗ' => 'R', 'ś' => 's', 'Ś' => 'S', 'ŝ' => 's', 'Ŝ' => 'S', 'š' => 's', 'Š' => 'S', 'ṡ' => 's', 'Ṡ' => 'S', 'ş' => 's', 'Ş' => 'S', 'ș' => 's', 'Ș' => 'S', 'ß' => 'SS', 'ť' => 't', 'Ť' => 'T', 'ṫ' => 't', 'Ṫ' => 'T', 'ţ' => 't', 'Ţ' => 'T', 'ț' => 't', 'Ț' => 'T', 'ŧ' => 't', 'Ŧ' => 'T', 'ú' => 'u', 'Ú' => 'U', 'ù' => 'u', 'Ù' => 'U', 'ŭ' => 'u', 'Ŭ' => 'U', 'û' => 'u', 'Û' => 'U', 'ů' => 'u', 'Ů' => 'U', 'ű' => 'u', 'Ű' => 'U', 'ũ' => 'u', 'Ũ' => 'U', 'ų' => 'u', 'Ų' => 'U', 'ū' => 'u', 'Ū' => 'U', 'ư' => 'u', 'Ư' => 'U', 'ü' => 'ue', 'Ü' => 'UE', 'ẃ' => 'w', 'Ẃ' => 'W', 'ẁ' => 'w', 'Ẁ' => 'W', 'ŵ' => 'w', 'Ŵ' => 'W', 'ẅ' => 'w', 'Ẅ' => 'W', 'ý' => 'y', 'Ý' => 'Y', 'ỳ' => 'y', 'Ỳ' => 'Y', 'ŷ' => 'y', 'Ŷ' => 'Y', 'ÿ' => 'y', 'Ÿ' => 'Y', 'ź' => 'z', 'Ź' => 'Z', 'ž' => 'z', 'Ž' => 'Z', 'ż' => 'z', 'Ż' => 'Z', 'þ' => 'th', 'Þ' => 'Th', 'µ' => 'u', 'а' => 'a', 'А' => 'a', 'б' => 'b', 'Б' => 'b', 'в' => 'v', 'В' => 'v', 'г' => 'g', 'Г' => 'g', 'д' => 'd', 'Д' => 'd', 'е' => 'e', 'Е' => 'E', 'ё' => 'e', 'Ё' => 'E', 'ж' => 'zh', 'Ж' => 'zh', 'з' => 'z', 'З' => 'z', 'и' => 'i', 'И' => 'i', 'й' => 'j', 'Й' => 'j', 'к' => 'k', 'К' => 'k', 'л' => 'l', 'Л' => 'l', 'м' => 'm', 'М' => 'm', 'н' => 'n', 'Н' => 'n', 'о' => 'o', 'О' => 'o', 'п' => 'p', 'П' => 'p', 'р' => 'r', 'Р' => 'r', 'с' => 's', 'С' => 's', 'т' => 't', 'Т' => 't', 'у' => 'u', 'У' => 'u', 'ф' => 'f', 'Ф' => 'f', 'х' => 'h', 'Х' => 'h', 'ц' => 'c', 'Ц' => 'c', 'ч' => 'ch', 'Ч' => 'ch', 'ш' => 'sh', 'Ш' => 'sh', 'щ' => 'sch', 'Щ' => 'sch', 'ъ' => '', 'Ъ' => '', 'ы' => 'y', 'Ы' => 'y', 'ь' => '', 'Ь' => '', 'э' => 'e', 'Э' => 'e', 'ю' => 'ju', 'Ю' => 'ju', 'я' => 'ja', 'Я' => 'ja');
    return str_replace(array_keys($transliterationTable), array_values($transliterationTable), $txt);
Renee answered 26/7, 2011 at 22:19 Comment(6)
Now it should work, hope it's okay that I edited your answer.Aerostatics
This array is a life saver! I use also the regex replace below, but it doesn't always work. Now I'm using both your $transliterationTable and the regex and no special char bug anymore! $string = preg_replace('~&([a-z]{1,2})(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig|quot|rsquo|orn|th);~i', '$1', $string );Gauntry
Instead of creating 2 very long arrays each time this function is called with array_keys and array_values (the latter actually not being needed since it works fine by passing the array directly), why don't you use strtr? return strtr($txt, $transliterationTable);Across
this will be work preg_replace("/&([a-z])[a-z]+;/i", "$1", htmlentities('é à è ê â î ç û â '));Waterloo
hats off to you sirFeticide
this is better than iconv solution. I was getting some " in result from iconv function, this one took care of those.Allanallana

My first recommendation is the iconv function. Namely because it's built into PHP, so doesn't require any external or 3rd party libraries. In addition, it's a function that's designed to do precisely what you are trying to accomplish (accept on character set as input, and output an alternate character set, specifically going from UTF-8 to ASCII). Below is an example of how to call this function:

$clean_ascii_output = iconv('UTF-8', 'ASCII//TRANSLIT', $utf8_input);

More information about the specifics of this PHP function can be found here:

Note: The iconv function accepts string inputs, so you'll want to iterate over data, and parse it such that you are passing in a string input.

Vitebsk answered 13/10, 2015 at 17:42 Comment(2)
My favorite part about this solution is that iconv() is a basic part of PHP since PHP v. 4.0.5 (circa 2001).Patrinapatriot
In my case, this method replaces them with '?' characters.Schlosser

Try iconv() with the //TRANSLIT option, or

recode_string(), or


Isleen answered 26/7, 2011 at 22:5 Comment(4)
You may want to use both //TRANSLIT and //IGNORE, actually.Woollen
@Dagon well obviously, but I like to actually answer the question along with advice.Isleen
I can't get iconv to work, not output even with //TRANSLIT//IGNORE.Houseline
I commented earlier about "ISO-8991.../TRANSLIT" not working, but the option I clearly needed was "ASCII//TRANSLIT" (per PuReWebDev's answer). Thanks, works great.Patrinapatriot

I coded this function which uses the HTML entities translation table built-in into PHP to romanize chars:

function Unaccent($string)
    if (strpos($string = htmlentities($string, ENT_QUOTES, 'UTF-8'), '&') !== false)
        $string = html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1', $string), ENT_QUOTES, 'UTF-8');

    return $string;

It works by applying htmlentities() and then removing common entities suffixes, a simple example:

 - ã = ã -> a
 - Ã = Ã -> A
 - õ = õ -> o
 - Õ = Õ -> O
 - æ = æ  -> ae
 - Æ = Æ  -> AE

Beware that for this to work properly your files need to be encoded in UTF-8 (no BOM obviously).

See also my other answer for another example.

Differential answered 26/7, 2011 at 22:34 Comment(1)
if your string is an $_POST / $_GET and doesnt seem to work. Use $string = utf8_encode($string) Unaccent($string); Thanks for the function alix, it works fine :), still hoping for php to come with a proper function for a general issue as this one :)Ungovernable

Saw this old question and still don't know what the best answer is. In case it can help others, here is a array I made up automatically from

array ("À" => "A",
"Á" => "A",
"Â" => "A",
"Ã" => "A",
"Ä" => "A",
"Å" => "A",
"Æ" => "AE",
"Ç" => "C",
"È" => "E",
"É" => "E",
"Ê" => "E",
"Ë" => "E",
"Ì" => "I",
"Í" => "I",
"Î" => "I",
"Ï" => "I",
"Ð" => "ETH",
"Ñ" => "N",
"Ò" => "O",
"Ó" => "O",
"Ô" => "O",
"Õ" => "O",
"Ö" => "O",
"Ø" => "O",
"Ù" => "U",
"Ú" => "U",
"Û" => "U",
"Ü" => "U",
"Ý" => "Y",
"Þ" => "THORN",
"ß" => "s",
"à" => "a",
"á" => "a",
"â" => "a",
"ã" => "a",
"ä" => "a",
"å" => "a",
"æ" => "ae",
"ç" => "c",
"è" => "e",
"é" => "e",
"ê" => "e",
"ë" => "e",
"ì" => "i",
"í" => "i",
"î" => "i",
"ï" => "i",
"ð" => "eth",
"ñ" => "n",
"ò" => "o",
"ó" => "o",
"ô" => "o",
"õ" => "o",
"ö" => "o",
"ø" => "o",
"ù" => "u",
"ú" => "u",
"û" => "u",
"ü" => "u",
"ý" => "y",
"þ" => "thorn",
"ÿ" => "y",
"Ā" => "A",
"ā" => "a",
"Ă" => "A",
"ă" => "a",
"Ą" => "A",
"ą" => "a",
"Ć" => "C",
"ć" => "c",
"Ĉ" => "C",
"ĉ" => "c",
"Ċ" => "C",
"ċ" => "c",
"Č" => "C",
"č" => "c",
"Ď" => "D",
"ď" => "d",
"Đ" => "D",
"đ" => "d",
"Ē" => "E",
"ē" => "e",
"Ĕ" => "E",
"ĕ" => "e",
"Ė" => "E",
"ė" => "e",
"Ę" => "E",
"ę" => "e",
"Ě" => "E",
"ě" => "e",
"Ĝ" => "G",
"ĝ" => "g",
"Ğ" => "G",
"ğ" => "g",
"Ġ" => "G",
"ġ" => "g",
"Ģ" => "G",
"ģ" => "g",
"Ĥ" => "H",
"ĥ" => "h",
"Ħ" => "H",
"ħ" => "h",
"Ĩ" => "I",
"ĩ" => "i",
"Ī" => "I",
"ī" => "i",
"Ĭ" => "I",
"ĭ" => "i",
"Į" => "I",
"į" => "i",
"İ" => "I",
"ı" => "i",
"Ĵ" => "J",
"ĵ" => "j",
"Ķ" => "K",
"ķ" => "k",
"ĸ" => "kra",
"Ĺ" => "L",
"ĺ" => "l",
"Ļ" => "L",
"ļ" => "l",
"Ľ" => "L",
"ľ" => "l",
"Ŀ" => "L",
"ŀ" => "l",
"Ł" => "L",
"ł" => "l",
"Ń" => "N",
"ń" => "n",
"Ņ" => "N",
"ņ" => "n",
"Ň" => "N",
"ň" => "n",
"ʼn" => "n",
"Ŋ" => "ENG",
"ŋ" => "eng",
"Ō" => "O",
"ō" => "o",
"Ŏ" => "O",
"ŏ" => "o",
"Ő" => "O",
"ő" => "o",
"Ŕ" => "R",
"ŕ" => "r",
"Ŗ" => "R",
"ŗ" => "r",
"Ř" => "R",
"ř" => "r",
"Ś" => "S",
"ś" => "s",
"Ŝ" => "S",
"ŝ" => "s",
"Ş" => "S",
"ş" => "s",
"Š" => "S",
"š" => "s",
"Ţ" => "T",
"ţ" => "t",
"Ť" => "T",
"ť" => "t",
"Ŧ" => "T",
"ŧ" => "t",
"Ũ" => "U",
"ũ" => "u",
"Ū" => "U",
"ū" => "u",
"Ŭ" => "U",
"ŭ" => "u",
"Ů" => "U",
"ů" => "u",
"Ű" => "U",
"ű" => "u",
"Ų" => "U",
"ų" => "u",
"Ŵ" => "W",
"ŵ" => "w",
"Ŷ" => "Y",
"ŷ" => "y",
"Ÿ" => "Y",
"Ź" => "Z",
"ź" => "z",
"Ż" => "Z",
"ż" => "z",
"Ž" => "Z",
"ž" => "z",
"ſ" => "s",
"ƀ" => "b",
"Ɓ" => "B",
"Ƃ" => "B",
"ƃ" => "b",
"Ƅ" => "SIX",
"ƅ" => "six",
"Ɔ" => "O",
"Ƈ" => "C",
"ƈ" => "c",
"Ɖ" => "D",
"Ɗ" => "D",
"Ƌ" => "D",
"ƌ" => "d",
"ƍ" => "delta",
"Ǝ" => "E",
"Ə" => "SCHWA",
"Ɛ" => "E",
"Ƒ" => "F",
"ƒ" => "f",
"Ɠ" => "G",
"Ɣ" => "GAMMA",
"ƕ" => "hv",
"Ɩ" => "IOTA",
"Ɨ" => "I",
"Ƙ" => "K",
"ƙ" => "k",
"ƚ" => "l",
"ƛ" => "lambda",
"Ɯ" => "M",
"Ɲ" => "N",
"ƞ" => "n",
"Ɵ" => "O",
"Ơ" => "O",
"ơ" => "o",
"Ƣ" => "OI",
"ƣ" => "oi",
"Ƥ" => "P",
"ƥ" => "p",
"Ƨ" => "TWO",
"ƨ" => "two",
"Ʃ" => "ESH",
"ƫ" => "t",
"Ƭ" => "T",
"ƭ" => "t",
"Ʈ" => "T",
"Ư" => "U",
"ư" => "u",
"Ʊ" => "UPSILON",
"Ʋ" => "V",
"Ƴ" => "Y",
"ƴ" => "y",
"Ƶ" => "Z",
"ƶ" => "z",
"Ʒ" => "EZH",
"Ƹ" => "EZH",
"ƹ" => "ezh",
"ƺ" => "ezh",
"Ƽ" => "FIVE",
"ƽ" => "five",
"DŽ" => "DZ",
"Dž" => "D",
"dž" => "dz",
"LJ" => "LJ",
"Lj" => "L",
"lj" => "lj",
"NJ" => "NJ",
"Nj" => "N",
"nj" => "nj",
"Ǎ" => "A",
"ǎ" => "a",
"Ǐ" => "I",
"ǐ" => "i",
"Ǒ" => "O",
"ǒ" => "o",
"Ǔ" => "U",
"ǔ" => "u",
"Ǖ" => "U",
"ǖ" => "u",
"Ǘ" => "U",
"ǘ" => "u",
"Ǚ" => "U",
"ǚ" => "u",
"Ǜ" => "U",
"ǜ" => "u",
"ǝ" => "e",
"Ǟ" => "A",
"ǟ" => "a",
"Ǡ" => "A",
"ǡ" => "a",
"Ǣ" => "AE",
"ǣ" => "ae",
"Ǥ" => "G",
"ǥ" => "g",
"Ǧ" => "G",
"ǧ" => "g",
"Ǩ" => "K",
"ǩ" => "k",
"Ǫ" => "O",
"ǫ" => "o",
"Ǭ" => "O",
"ǭ" => "o",
"Ǯ" => "EZH",
"ǯ" => "ezh",
"ǰ" => "j",
"DZ" => "DZ",
"Dz" => "D",
"dz" => "dz",
"Ǵ" => "G",
"ǵ" => "g",
"Ƕ" => "HWAIR",
"Ƿ" => "WYNN",
"Ǹ" => "N",
"ǹ" => "n",
"Ǻ" => "A",
"ǻ" => "a",
"Ǽ" => "AE",
"ǽ" => "ae",
"Ǿ" => "O",
"ǿ" => "o",
"Ȁ" => "A",
"ȁ" => "a",
"Ȃ" => "A",
"ȃ" => "a",
"Ȅ" => "E",
"ȅ" => "e",
"Ȇ" => "E",
"ȇ" => "e",
"Ȉ" => "I",
"ȉ" => "i",
"Ȋ" => "I",
"ȋ" => "i",
"Ȍ" => "O",
"ȍ" => "o",
"Ȏ" => "O",
"ȏ" => "o",
"Ȑ" => "R",
"ȑ" => "r",
"Ȓ" => "R",
"ȓ" => "r",
"Ȕ" => "U",
"ȕ" => "u",
"Ȗ" => "U",
"ȗ" => "u",
"Ș" => "S",
"ș" => "s",
"Ț" => "T",
"ț" => "t",
"Ȝ" => "YOGH",
"ȝ" => "yogh",
"Ȟ" => "H",
"ȟ" => "h",
"Ƞ" => "N",
"ȡ" => "d",
"Ȣ" => "OU",
"ȣ" => "ou",
"Ȥ" => "Z",
"ȥ" => "z",
"Ȧ" => "A",
"ȧ" => "a",
"Ȩ" => "E",
"ȩ" => "e",
"Ȫ" => "O",
"ȫ" => "o",
"Ȭ" => "O",
"ȭ" => "o",
"Ȯ" => "O",
"ȯ" => "o",
"Ȱ" => "O",
"ȱ" => "o",
"Ȳ" => "Y",
"ȳ" => "y",
"ȴ" => "l",
"ȵ" => "n",
"ȶ" => "t",
"ȷ" => "j",
"ȸ" => "db",
"ȹ" => "qp",
"Ⱥ" => "A",
"Ȼ" => "C",
"ȼ" => "c",
"Ƚ" => "L",
"Ⱦ" => "T",
"ȿ" => "s",
"ɀ" => "z",
"Ɂ" => "STOP",
"ɂ" => "stop",
"Ƀ" => "B",
"Ʉ" => "U",
"Ʌ" => "V",
"Ɇ" => "E",
"ɇ" => "e",
"Ɉ" => "J",
"ɉ" => "j",
"Ɋ" => "Q",
"ɋ" => "q",
"Ɍ" => "R",
"ɍ" => "r",
"Ɏ" => "Y",
"ɏ" => "y",
"ɐ" => "a",
"ɑ" => "alpha",
"ɒ" => "alpha",
"ɓ" => "b",
"ɔ" => "o",
"ɕ" => "c",
"ɖ" => "d",
"ɗ" => "d",
"ɘ" => "e",
"ə" => "schwa",
"ɚ" => "schwa",
"ɛ" => "e",
"ɜ" => "e",
"ɝ" => "e",
"ɞ" => "e",
"ɟ" => "j",
"ɠ" => "g",
"ɡ" => "script",
"ɣ" => "gamma",
"ɤ" => "rams",
"ɥ" => "h",
"ɦ" => "h",
"ɧ" => "heng",
"ɨ" => "i",
"ɩ" => "iota",
"ɫ" => "l",
"ɬ" => "l",
"ɭ" => "l",
"ɮ" => "lezh",
"ɯ" => "m",
"ɰ" => "m",
"ɱ" => "m",
"ɲ" => "n",
"ɳ" => "n",
"ɵ" => "barred",
"ɷ" => "omega",
"ɸ" => "phi",
"ɹ" => "r",
"ɺ" => "r",
"ɻ" => "r",
"ɼ" => "r",
"ɽ" => "r",
"ɾ" => "r",
"ɿ" => "r",
"ʂ" => "s",
"ʃ" => "esh",
"ʄ" => "j",
"ʅ" => "squat",
"ʆ" => "esh",
"ʇ" => "t",
"ʈ" => "t",
"ʉ" => "u",
"ʊ" => "upsilon",
"ʋ" => "v",
"ʌ" => "v",
"ʍ" => "w",
"ʎ" => "y",
"ʐ" => "z",
"ʑ" => "z",
"ʒ" => "ezh",
"ʓ" => "ezh",
"ʚ" => "e",
"ʞ" => "k",
"ʠ" => "q",
"ʣ" => "dz",
"ʤ" => "dezh",
"ʥ" => "dz",
"ʦ" => "ts",
"ʧ" => "tesh",
"ʨ" => "tc",
"ʩ" => "feng",
"ʪ" => "ls",
"ʫ" => "lz",
"ʮ" => "h",
"ʯ" => "h")
Cristoforo answered 21/9, 2019 at 15:17 Comment(0)

I hope this will be useful for anybody:

This class removes diacritics from strings containing Latin-1 Supplement, Latin Extended-A and Latin Extended-B special characters.


$specialCharacters = "";
$specialCharacters .= "Latin-1 Supplement".PHP_EOL;
$specialCharacters .= "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ".PHP_EOL;
$specialCharacters .= "Latin Extended-A".PHP_EOL;
$specialCharacters .= "ĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſ".PHP_EOL;
$specialCharacters .= "Latin Extended-B".PHP_EOL;
$specialCharacters .= "ƒǺǻǼǽǾǿ".PHP_EOL;
$specialCharacters .= "Latin Extended Additional".PHP_EOL;
$specialCharacters .= "ẀẁẂẃẄẅỲỳ".PHP_EOL;

print "<pre>";
print removeDiacritics($specialCharacters).PHP_EOL;
print "</pre>";


Latin-1 Supplement


Latin Extended-A


Latin Extended-B


Latin Extended Additional



Latin-1 Supplement


Latin Extended-A


Latin Extended-B


Latin Extended Additional


Baker answered 31/10, 2014 at 11:14 Comment(0)
function fn_normalize($s) { // Replaces all diacritics/accents
    return transliterator_transliterate('Any-Latin; Latin-ASCII; Lower()', $s);

$a = [" Válue1  ", "válue2 ", "válue3", "Café à la carte", "A æ Übérmensch på høyeste nivå! И я люблю PHP! fi"];

$result = array_map('fn_normalize', $a);


This worked for me. You might have to edit line 934 of your php.ini where it says


Remove the semicolon.

Carman answered 8/6, 2023 at 18:42 Comment(0)

The most generic way to solve this is to use Unicode Normalization as it works automatically on all accents - you don't have to prepare the list up front. I don't know if it's easily available in PHP, I have used it in C and Java. Essentially, you first transform the string so that all accented characters are represented by regular character plus so-called composing diacritical mark (a built-in or external library should provide this function), and then remove the composing diacritics (using a specialized library, using character properties the language provides or using some regular expression extensions).

Dobson answered 25/2, 2012 at 16:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.