Is casting strings from `wchar_t` to `char16_t` legal if encoding and width is the same?
Asked Answered
C

1

5

On Windows, wchar_t is a UTF-16(LE) formatted character, which is -- for the most part -- equivalent to char16_t. However, these two character types are still distinct types in the C++ type-system -- which makes me uncertain whether converting between sequences of these two character types is legal as per the C++ standard.

My question is this: In C++17, is it legal to perform the following casts, and to read from the converted pointers:

  • reinterpret_cast<const wchar_t*>(char16_ptr) where decltype(char16_ptr) is const char16_t*, and
  • reinterpret_cast<const char16_t*>(wchar_ptr) where decltype(wchar_ptr) is const wchar_t*

For the purposes of this question, assume the following:

  • sizeof(wchar_t) == sizeof(char16_t), and
  • wchar_t is formatted the same as char16_t (as is the case on Windows)

Basically, is this a violation of a strict-aliasing?

My understanding that the cast itself is valid thanks to [expr.reinterpret.cast]/7, but that the result of the cast cannot safely be used since the type is being aliased by something that isn't char, unsigned char, or std::byte. Is this interpretation correct?


Note: Other questions have been asked regarding wchar_t and char16_t being the same, but this question is not a duplicate of those as far as I can tell. Notably, the question "Are wchar_t and char16_t the same on Windows?" actually performs a reinterpret_cast between pointers, but none of the answers actually address whether this cast was ever legal in the first place.

Collaborative answered 2/2, 2022 at 19:46 Comment(3)
It's perfectly legal to convert from a pointer to one type to a pointer to another type. The only problem you might run into would be compiling on Linux where wchar_t is the same as char32_t and you'll end up thinking you've hit a null terminator before the end of the string.Coeducation
The conversion is perfectly legal, but it doesn't mean using (i.e. reading from, writing to) the pointed object is legal; this could be a violation of strict-aliasing -- which I think it is. The question is whether this is a violation, or is legal.Collaborative
Beware! wchar_t on Windows is unsigned.Chromogen
S
6

You already know the answer to this: strictly speaking, no.

wchar_t is not char16_t. Neither derives from the other. Neither is similar to the other. Neither is a signed/unsigned version of the other. Neither is an aggregate containing the other.And neither of them is a bytewise type (char, etc).

So you cannot access a wchar_t through a pointer/reference to a char16_t.

If strict avoidance of strict aliasing is your goal, you're going to have to copy the data to a different object. That is valid, assuming they both have the same representation.

Sha answered 2/2, 2022 at 20:5 Comment(1)
That's what I feared; too bad. I wasn't sure if there was any flexibility with layout-compatibility thanks to the values being trivially copyable. I guess I'll have to std::copyCollaborative

© 2022 - 2024 — McMap. All rights reserved.