I think the question is pretty simple, do I need all the rest of the stuff in Unicode after the basic plane? What kind of stuff is included and is that really needed? (and for what purposes?)
Thanks.
I think the question is pretty simple, do I need all the rest of the stuff in Unicode after the basic plane? What kind of stuff is included and is that really needed? (and for what purposes?)
Thanks.
If you intend to sell anything in China, then the GB 18030 standard is mandatory, and requires characters beyond BMP (Basic Multilingual Plane). The standard is enforced, and in order to sell there you need to pass a GB 18030 certification.
There also national standards in Japan, and Hong Kong that require characters beyond BMP. Even if these standards are not enforced like the Chinese one, supporting them might give you some edge.
So the simple answer would be: you need some of the stuff there.
=== 2016 ===
That was 7 years ago. Now everybody talks about emojis. Well, most emojis are beyond BMP :-)
It depends on whether you control your data or not. If you are using Unicode data from anyone other than yourself, you generally must assume that it may include supplementary characters, which in turn means you need to deal with 4-byte UTF-8, UTF-16 surrogate characters, and so on.
You should try, if at all possible, to support all of Unicode including supplementary planes. There are now living languages sitting in the supplementary planes such as Miao. Other living languages will be added in the future and some languages currently need the supplementary private use area. Then there is also what Mihai Nita said in his answer.
MySQL, starting with 5.5, also supports supplementary planes.
It's better to take the little bit of time now to fully support Unicode so that in the future you won't have problems if you actually do need it. And you don't know who will be using your software and what scripts they will be using in the future. Now most of the rendering engines, GUI toolkits, browsers, operating systems, etc., will support this without troubles.
Although this question was asked several years ago, I ran across this on a search, and things have changed since then. I am currently dealing with problems where programmers either assumed there would be no need for supplementary plane support, or it remained untested.
See the complete list of character charts.
The supplementary characters currently contain ancient scripts. Unless you have an application that should handle ancient scripts such as Kharoshthi, Old Persion and Cuneiform, then probably not.
I guess you will only have to deal with this issue if you encounter a UTF-8 or UTF-16 implementation that is not complete. Some implementations of UTF-8 do not support 4-byte characters, which is the supplementary plane: the characters above U+10000. MySQL comes to mind.
© 2022 - 2024 — McMap. All rights reserved.