How do you safely declare a 16-bit string literal in C?
Asked Answered
C

2

6

I'm aware that there is already a standard method by prefixing with L:

wchar_t *test_literal = L"Test";

The problem is that wchar_t is not guaranteed to be 16-bits, but for my project, I need a 16-bit wchar_t. I'd also like to avoid the requirement of passing -fshort-wchar.

So, is there any prefix for C (not C++) that will allow me to declare a UTF-16 string literal?

Cr answered 2/6, 2018 at 14:23 Comment(13)
"I need a 16-bit wchar_t" - why?Coulson
@Coulson 1. I am on an embedded platform. 2. It is part of a Windows-like API.Cr
What's wrong with -fshort-wchar?Coulson
@Coulson The prefix will be part of a header file, included by my library and an application. I don't want to force the application to use -fshort-wchar.Cr
This feels like some sort of XY problem.Coulson
You'd be better off initialising as they are, and provide a conversion function to convert the literal to an array of whatever type you use to specifically represent UTF-16 characters (short, int16_t), or whatever. That will make it easier on systems where wchar_t and UTF-16 are not the same.Autorotation
@Coulson Yeah... I want to have a WCHAR type, and a TEXT macro, like Windows.Cr
But why? What is the overall problem you're trying to solve here?Coulson
@Coulson I want to be able to switch between ASCII and Unicode. So, I would make a TEXT macro that took a literal as a parameter, and depending on whether the library was built for ASCII or Unicode, optionally prefix the literal to turn it into a wchar_t.Cr
Yes, but why?Coulson
Otherwise I have to use an ugly array. wchar_t str[4] = { 'T', 'e', 's', 't' }Cr
No, you could just provide a single UTF-8 interface. Why force applications to recompile if they want to use Unicode?Coulson
Let us continue this discussion in chat.Cr
G
7

So, is there any prefix for C (not C++) that will allow me to declare a UTF-16 string literal?

Almost, but not quite. C2011 offers you these options:

  • character string literals (elements of type char) - no prefix. Example: "Test"
  • UTF-8 string literals (elements of type char) - 'u8' prefix. Example: u8"Test"
  • wide string literals of three flavors:
    • wchar_t elements - 'L' prefix. Example: L"Test"
    • char16_t elements - 'u' prefix. Example: u"Test"
    • char32_t elements - 'U' prefix. Example: U"Test"

Note well, however, that although you can declare a wide string literal having elements of type char16_t, the standard does not guarantee that the UTF-16 encoding will be used for them, nor does it make any particular requirements on which characters outside the language's basic character set must be included in the execution character set. You can test the former at compile time, however: if char16_t represents UTF-16-encoded characters in a given conforming implementation, then that implementation will define the macro __STDC_UTF_16__ to 1.

Note also that you need to include (C's) uchar.h header to use the char16_t type name, but the u"..." syntax for literals does not depend on that. Take care, as this header name collides with one used by the C interface of the International Components for Unicode, a relatively widely-used package for Unicode support.

Finally, be aware that much of this was new in C2011. To make use of it, you need a conforming C2011 implementation. Those are certainly available, but so are a lot of implementations that conform only to earlier standards, or even to none. Standard C99 and earlier do not provide a string literal syntax that guarantees 16-bit elements.

Guanabana answered 2/6, 2018 at 14:39 Comment(0)
S
-2

You need a 16 bit wchar_t - but it's out of your control. If the compiler says it's 32 bit then it's 32 bit and it doesn't matter what you want or need.

The string classes are templated. You can always use a template to create a template class with 16 bit characters. I personally would try to remove any Unicode handling that is not UTF-8.

An alternative method is a clever #ifdef that will produce a compile time error if wchar_t is not 16 bit, and solve the problem when you actually need to solve it.

Shiah answered 2/6, 2018 at 14:42 Comment(5)
Templated string classes? In C?Coulson
I think I will have to use the #ifdef and -fshort-wchar. It's the only method that is guaranteed to work.Cr
Indeed wchar_t is not guaranteed to be 16-bit -- it could be either more or less -- but C2011 does have char16_t, which is exactly 16 bits, and a syntax for wide string literals having elements of that type.Guanabana
@JohnBollinger Problem is that not all compilers support C2011 yet (and I think especially embedded toolchains).Cr
That's quite true, @MarkYisri, but C2011 is the current C standard, and it's not even that new any more. Whereas we can and should recognize that some relevant implementations do not conform to that version, questions that are not otherwise qualified should be interpreted first in light of the current version of the language.Guanabana

© 2022 - 2024 — McMap. All rights reserved.