What characters are NOT present in Unicode?
Asked Answered
D

4

17

I have heard that some characters are not present in the Unicode standard despite being written in everyday life by populations of some areas. Especially I have heard about recent Chinese first names fabricated by assembling existing characters parts, but I can't find any reference for this.

For instance, the character below is very common for 50 million people, yet it was not in Unicode until October 2009:

enter image description here

Is there a list of such characters? (images, or website listing such characters as images)

Deputy answered 8/6, 2011 at 9:27 Comment(1)
This: en.wikipedia.org/wiki/File:Prince_logo.svg Although that is more of a publicity stunt than an actual character.Rebak
B
9

Also: Here's unicode.org's list of unsupported scripts

Bilateral answered 8/6, 2011 at 22:48 Comment(0)
L
8

Well, there's loads of stuff not present in Unicode (though new characters are still being added).

Some examples:

  • Due to Han Unification, Unicode uses one codepoint for several similar characters from different languages. People disagree whether these characters are really "the same"; if you believe they should be represented separately, then these separate representations could be said to be "missing" (though this is something of a philosophical question).
  • In a similar vein, many languages (especially Asian languages) sometimes have several variants of one character/glyph. The distinction between "one character with several representations" (=one codepoint) and "distinct characters" (=different codepoints) is somewhat arbitratry, thus there are cases (e.g. with Kanji characters) where some people feel alternative variants are "missing".
  • Many historic and rarely used characters are missing.
  • Many old/historic scripts are not covered, e.g. Demotic. Actually, there is an initiative specifically for including more scripts in Unicode, the Script Encoding Initiative(SEI).

There is also a page by the W3C on this topic, Missing characters and glyphs, with more explanations.

Liechtenstein answered 8/6, 2011 at 9:45 Comment(1)
I believe the ~260 variation selectors are meant to address the first two bullets. Their code points are 180B–180D (abbreviated FVS1–3), 303E (ɪᴅᴇᴏɢʀᴀᴘʜɪᴄ ᴠᴀʀɪᴀᴛɪᴏɴ sᴇʟᴇᴄᴛᴏʀ, IVS), FE00–FE0F (VS1–VS16), and E0100–E01EF (VS17–VS256). Actually IVS is different: it counts as \p{Other_Symbol} and \p{Grapheme_Base}, whilst the others are \p{Nonspacing_Mark}, \p{Grapheme_Extend}, \p{Default_Ignorable_Code_Point}, and \p{Variation_Selector}. I don’t know what IVS is really for. Scriptwise, FVS1–3 are \p{Mongolian}, IVS is \p{Common}, and VS1–256 are \p{Inherited}. Hope this helps.Aruabea
M
3

There are tons of characters from the symbol part of the standard that are annoyingly not included.

See the "Missing symmetric versions" section of https://web.archive.org/web/20210830121541/http://xahlee.info/comp/unicode_arrows.html for a bunch of arrow symbols that exist, but only in certain directions. Some are just silly. For example, there is ⥂, ⥃, and ⥄, but there isn't a right pointing version of the last one.

And you can see from http://en.wikipedia.org/wiki/Unicode_subscripts_and_superscripts that they picked apparently randomly which letters to support in super- and sub-script form. For example, they include the subscript vowels a, e, o, and even schwa (ə), but not i, which would be very useful, as it's a common subscript in mathematical typesetting. Take a look at the wikipedia article for more details (you'll need a unicode font installed, because at least at the time of this writing they regular ascii equivalents are not explicitly listed), but basically they picked about half of the latin alphabet seemingly at random for each of upper- and lower-case super- and sub-script characters.

Also, a lot of symbols that would be convenient for building shapes with unicode do not exist.

Monkhmer answered 17/8, 2011 at 0:42 Comment(0)
O
1

It does not support the bilabial trill letter, turned beta, reversed k.

Officialese answered 23/2, 2020 at 22:52 Comment(2)
Thanks! Does the bilabial trill letter have a character that some people write? Are the turned beta and reversed k also written often by some people?Deputy
At first I wasn't sure whether this was a serious answer or a joke mentioning made-up letters. Adding a link or a little explanation would really be an improvement; I found the Wikipedia page about the Voiceless bilabial trill (the Voiced bilabial trill apparently has a Unicode symbol), but couldn't find any information about the other cited letters.Wrecker

© 2022 - 2024 — McMap. All rights reserved.