In C (quoting the N1570 draft, section 7.1.1):
A wide string is a contiguous sequence of wide characters terminated
by and including the first null wide character.
where a "wide character" is a value of type wchar_t
, which is defined in <stddef.h>
as an integer type.
I can't find a definition of "wide string" in the N3337 draft of the C++ standard, but it should be similar. One minor difference is that wchar_t
is a typedef in C, and a built-in type (whose name is a keyword) in C++. But since C++ shares most of the C library, including functions that act on wide strings, it's safe to assume that the C and C++ definitions are compatible. (If someone can find something more concrete in the C++ standard, please comment or edit this paragraph.)
In both C and C++, the size of a wchar_t
is implementation-defined. It's typically either 2 or 4 bytes (16 or 32 bits, unless you're on a very exotic system with bytes bigger than 8 bits). A wide string is a sequence of wide characters (wchar_t
values), terminated by a null wide character. The terminating wide character will have the same size as any other wide character, typically either 2 or 4 bytes.
In particular, given that wchar_t
is bigger than char
, a single null byte does not terminate a wide string.
It's also worth noting that byte order is implementation-defined. A wide character with the value 0x1234
, when viewed as a sequence of 8-bit bytes, might appear as any of:
0x12
, 0x34
0x34
, 0x12
0x00
, 0x00
, 0x12
, 0x34
0x34
, 0x12
, 0x00
, 0x00
And those aren't the only possibilities.
wchar_t
(notwchar
) is a predefined type. In C,wchar_t
is a typedef defined in<stddef.h>
. In both cases, the size is implementation-defined; on my system its size is 4 bytes (32 bits). – Symbology