Do I need supplementary plane?

Asked 21/6, 2009 at 11:6 Answered 1/3, 2012 at 7:54

Solved unicode utf astral-plane supplementary

I think the question is pretty simple, do I need all the rest of the stuff in Unicode after the basic plane? What kind of stuff is included and is that really needed? (and for what purposes?)

Thanks.

Faradize answered 21/6, 2009 at 11:6 Comment(0)

If you intend to sell anything in China, then the GB 18030 standard is mandatory, and requires characters beyond BMP (Basic Multilingual Plane). The standard is enforced, and in order to sell there you need to pass a GB 18030 certification.

There also national standards in Japan, and Hong Kong that require characters beyond BMP. Even if these standards are not enforced like the Chinese one, supporting them might give you some edge.

So the simple answer would be: you need some of the stuff there.

=== 2016 ===

That was 7 years ago. Now everybody talks about emojis. Well, most emojis are beyond BMP :-)

Besprinkle answered 11/11, 2009 at 8:11 Comment(1)

Beyond certification problem, the Unihan IICore set of character, which contains all the Han (Chinese, Japanese, Korean) characters of current modern usage contains 62 characters in the Supplementary Ideographic Plane (SIP) – Kerch 29/7, 2010 at 14:31

It depends on whether you control your data or not. If you are using Unicode data from anyone other than yourself, you generally must assume that it may include supplementary characters, which in turn means you need to deal with 4-byte UTF-8, UTF-16 surrogate characters, and so on.

Assimilation answered 21/6, 2009 at 11:41 Comment(1)

Great answer! So few people simply do not grok this basic principle about data provenance. – Zischke 30/5, 2013 at 2:9

You should try, if at all possible, to support all of Unicode including supplementary planes. There are now living languages sitting in the supplementary planes such as Miao. Other living languages will be added in the future and some languages currently need the supplementary private use area. Then there is also what Mihai Nita said in his answer.

MySQL, starting with 5.5, also supports supplementary planes.

It's better to take the little bit of time now to fully support Unicode so that in the future you won't have problems if you actually do need it. And you don't know who will be using your software and what scripts they will be using in the future. Now most of the rendering engines, GUI toolkits, browsers, operating systems, etc., will support this without troubles.

Although this question was asked several years ago, I ran across this on a search, and things have changed since then. I am currently dealing with problems where programmers either assumed there would be no need for supplementary plane support, or it remained untested.

Lugworm answered 1/3, 2012 at 7:54 Comment(0)

See the complete list of character charts.

The supplementary characters currently contain ancient scripts. Unless you have an application that should handle ancient scripts such as Kharoshthi, Old Persion and Cuneiform, then probably not.

I guess you will only have to deal with this issue if you encounter a UTF-8 or UTF-16 implementation that is not complete. Some implementations of UTF-8 do not support 4-byte characters, which is the supplementary plane: the characters above U+10000. MySQL comes to mind.

Turbo answered 21/6, 2009 at 11:31 Comment(1)

Actually I am using MySQL and that is the reason I am asking. I was wondering whether to use Binary or UTF-8 tables. Thanks. – Faradize 21/6, 2009 at 13:43

Recommended topics

Hot tags