Lets say I have the characters Ú, Ù, Ü. All of them are similar glyphically to the English U.
Is there some list or algorithm to do this:
- Given a Ú or Ù or Ü return the English U
- Given a English U, return the list of all U-similar characters
I'm not sure if the code point of the Unicode characters is the same across all fonts? If it is, I suppose there could be some easy way and efficient to do this?
UPDATE
If you're using Ruby, there is a gem available unicode-confusable for this that may help in some cases.
confusables
are almost exact lookalikes and not the accented versions. (like: 𝐔𝑈𝑼𝒰𝖴𝚄) – EnounceÅ
is not anA
with a ring modification, it is a separate character.A
is as different fromÅ
, asA
is different fromB
. In contrast toe
andê
, where the later is ane
with a circumflex modification. Sorting in Swedish is like this:[A], [B], [C], [EÉÈÊË], [X], [Y], [Z], [Å], [Ä], [Ö]
, where all chars within the same brackets should be treated as the same. – Oates