Unicode characters beyond U+FFFF in Qt
QChar
itself only supports Unicode characters up to U+FFFF
.
QString
supports Unicode characters beyond U+FFFF
by concatenating two QChars (that is, by using UTF-16 encoding). However, the QString API doesn't help you much if you need to process characters beyond U+FFFF
. As an example, a QString instance which contains the single Unicode character U+131F6
will return a size of 2, not 1.
I've opened QTBUG-18868 about this problem back in 2011, but after more than three years (!) of discussion, it was finally closed as "out of scope" without any resolution.
Solution
You can, however, download and use these Unicode Qt string wrapper classes which have been attached to the Qt bug report. Licensed under the LGPL.
This download contains the wrapper classes QUtfString
, QUtfChar
, QUtfRegExp
and QUtfStringList
which supplement the existing Qt classes and allow you to do things like this:
QUtfString str;
str.append(0x1307C); // Some Unicode character beyond U+FFFF
Q_ASSERT(str.size() == 1);
Q_ASSERT(str[0] == 0x1307C);
str += 'a';
Q_ASSERT(str.size() == 2);
Q_ASSERT(str[1] == 'a');
Q_ASSERT(str.indexOf('a') == 1);
For further details about the implementation, usage and runtime complexity please see the API documentation included within the download.