Similar looking UTF8 characters for ASCII
Asked Answered
T

1

5

I'm looking for a table which contains ASCII characters and same looking UTF8 characters. I know it also depends on the font is they look the same, but something generic to start with is enough.

>>> # PY3 code:
>>> a='H'  # ascii
>>> b='Н'  # utf8
>>> a==b
False
>>> ' '.join(format(ord(x), 'b') for x in a)
'1001000'
>>> ' '.join(format(ord(x), 'b') for x in b)
'10000011101'
>>> a='P'  # ascii
>>> b='Ρ'  # utf8
>>> a==b
False
>>> ' '.join(format(ord(x), 'b') for x in a)
'1010000'
>>> ' '.join(format(ord(x), 'b') for x in b)
'1110100001'
Triviality answered 22/10, 2017 at 7:50 Comment(5)
"UTF-8 characters" are simply Unicode characters (codepoints, to be precise), as UTF-8 is just an encoding for Unicode. So you are looking for a way to find Unicode codepoints that look similar to ASCII characters?Fluid
Yes, that is what I mean. Sorry if that was not obvious.Triviality
Similar question on security.SE: List of visually similar characters, for detecting spoofing and social engineering attacks.Fluid
Yes, but I'm not looking for "lol" ~ "lo1".Triviality
You've got some terminology issues. Your question is equally valid to any system that uses Unicode, not just those that store strings using its UTF-8 encoding. All the characters you are using are Unicode. By ASCII, you seem to mean C0 Controls and Basic Latin. UTF-8 is not considered extended ASCII. Also, your for x in a iterates over Unicode codepoints, not UTF-8 code units. (== does compare the sequences of UTF-8 code units.)Gabi
T
13

This is very useful tool as it will show you all characters which look similar and you can choose if this is REALLY similar enough for you :)

https://unicode.org/cldr/utility/confusables.jsp?a=test&r=None

Some other resources:

Triviality answered 22/10, 2017 at 8:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.