cf @tchrist, with INTL php extension
https://www.php.net/manual/en/book.intl.php
preg_replace('/\pM*/u','',normalizer_normalize( $mystring, Normalizer::FORM_D));
eéèêëiîïoöôuùûüaâäÅ Ἥ ŐǟǠ ǺƶƈƉųŪŧȬƀ␢ĦŁȽŦ ƀǖ becomes
eeeeeiiiooouuuuaaaA Η OaA AƶƈƉuUŧOƀ␢ĦŁȽŦ ƀu
As tchrist emphasises, not all unicode characters are considered decomposable:
extract from Unicode charts:
U0080.pdf
00CF Ï LATIN CAPITAL LETTER I WITH DIAERESIS
≡ 0049 I 0308 ¨
NB this symbol « ≡ » indicate an available decomposition
00D0 Ð LATIN CAPITAL LETTER ETH
→ 00F0 ð latin small letter eth
→ 0110 Đ latin capital letter d with stroke
→ 0189 Ɖ latin capital letter african d
no decomposition available, IMHO strangely (we could consider ASCII letter D as an acceptable equivalent).
U0100.pdf
0110 Đ LATIN CAPITAL LETTER D WITH STROKE
→ 00D0 Ð latin capital letter eth
→ 0111 đ latin small letter d with stroke
→ 0189 Ɖ latin capital letter african d
even stranger: this one is identified as LATIN CAPITAL LETTER D (with stroke), but not decomposable as such! Perhaps a cooler solution should be to get the unicode description of each char, and compare it with the description of each ascii char (and replace accordingly). Anyone? ;-]
cf http://unicode.org/Public/UNIDATA/UnicodeData.txt
setlocale(LC_CTYPE, 'en_US.UTF-8');
-> LC_TYPE, not _COLLATE. Tschüss. – ExcitorCŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ
and gettingCSOEZsoez"Yyenu
A'A^A~A"AAAECE'E^E"E
I'I^I"ID~NO'O^O~O"OO
U'U^U"U'Yssa'a^a~a"aaaec
e'e^e"ei'i^i"id~n
o'o^o~o"oou'u^u"u'y"y
– Evaporimeter