So, is there any prefix for C (not C++) that will allow me to declare a UTF-16 string literal?
Almost, but not quite. C2011 offers you these options:
- character string literals (elements of type
char
) - no prefix. Example: "Test"
- UTF-8 string literals (elements of type
char
) - 'u8' prefix. Example: u8"Test"
- wide string literals of three flavors:
wchar_t
elements - 'L' prefix. Example: L"Test"
char16_t
elements - 'u' prefix. Example: u"Test"
char32_t
elements - 'U' prefix. Example: U"Test"
Note well, however, that although you can declare a wide string literal having elements of type char16_t
, the standard does not guarantee that the UTF-16 encoding will be used for them, nor does it make any particular requirements on which characters outside the language's basic character set must be included in the execution character set. You can test the former at compile time, however: if char16_t
represents UTF-16-encoded characters in a given conforming implementation, then that implementation will define the macro __STDC_UTF_16__
to 1
.
Note also that you need to include (C's) uchar.h
header to use the char16_t
type name, but the u"..."
syntax for literals does not depend on that. Take care, as this header name collides with one used by the C interface of the International Components for Unicode, a relatively widely-used package for Unicode support.
Finally, be aware that much of this was new in C2011. To make use of it, you need a conforming C2011 implementation. Those are certainly available, but so are a lot of implementations that conform only to earlier standards, or even to none. Standard C99 and earlier do not provide a string literal syntax that guarantees 16-bit elements.
wchar_t
" - why? – Coulson-fshort-wchar
? – Coulson-fshort-wchar
. – Crshort
,int16_t
), or whatever. That will make it easier on systems wherewchar_t
andUTF-16
are not the same. – AutorotationWCHAR
type, and aTEXT
macro, like Windows. – CrTEXT
macro that took a literal as a parameter, and depending on whether the library was built for ASCII or Unicode, optionally prefix the literal to turn it into a wchar_t. – Crwchar_t str[4] = { 'T', 'e', 's', 't' }
– Cr