What is the fate of wchar_t in c++0x?
Asked Answered
C

1

16

What is the fate of wchar_t in c++0x considering the new character types char8_t, char16_t, and char32_t?

More importantly, what about std::wstring, std::wcout, etc?

Are the w* family classes deprecated?
Are there new std::ustring and std::Ustring classes for new character types?

Consolatory answered 13/5, 2011 at 20:33 Comment(13)
See #872991. It doesn't answer all your questions (i.e. deprecation), but I guess wchar_t isn't going to be deprecated. There's too much existing code already using it.Transubstantiation
@Boaz Yaniv: Not to mention that deprecation usually doesn't mean anything. Implementors implement deprecated things because they need to compile old software, and nobody's going to rewrite old software just because of a deprecation warning.Springfield
No one is going to rewrite bad software over a deprecation warning but honestly find and replace isn't that big of a deal. We've already done away with NULL in favor of nullptr in all our code.Winepress
@AJG: The main problem with replacing wchar_t with char16_t (or whatever is applicable to your platform) as I see it is that many existing libraries are dependent on it. And although you can rather easily change your own code, you usually don't want to touch 3rd-party libraries, and at least some of the library writers would be wary of changing their libraries and breaking existing code.Transubstantiation
@David: especially in C++. In 03, at any rate, deprecation is defined to mean "the feature may be removed in a future version of the standard". So conforming compilers must implement it. And it turns out that even non-deprecated features may be removed in future versions of the standard, since C++0x has some backward incompatibilities unrelated to things deprecated in C++03. So all deprecation really means is, "we're not sure we really wanted to put this in, but we did. kthxbye, the authors".Asberry
Why would you ever want to replace wchar_t with char16_t? With wchar_t you might be able to hold a Unicode character (it can on my machines, since sizeof(wchar_t) for me is always 4), whereas with char16_t, you are guaranteed to be unable to hold a Unicode character. Why in the world would you want to do such a daft thing???Cleanly
Because the Windows API uses UTF-16.Photoelectric
@tchrist: same reason you might use int32_t instead of long - because you prefer to code without the existential doubt and uncertainty of not knowing what range of values your type holds. Depending what the code does, removing possibilities might make it easier to reason about it, since all platforms will behave (closer to) the same. Also, unicode literals have type char16_t[] (for u) or char32_t[] (for U), not type wchar_t[] (which is L). I don't see the fascination with UTF-16, but some people (MS) seem to like it.Asberry
Microsoft was an earlier adopter of Unicode (UCS-2), back when it was assumed that 65,536 characters would be enough for everyone. When Unicode was expanded beyond the BMP, using UTF-16 instead of UTF-32 allowed more backwards compatibility.Photoelectric
@dan04: sorry, I was being flippant. I do see the fascination: for almost all text it's half the size of UTF-32, and Windows is locked into it for legacy reasons. Furthermore, a lot of the difficulties of handling UTF-16 (variable length characters) are actually still present in UTF-32 due to combining marks. In fact the fundamental Unicode difficulties are harder, because canonical equivalencies are harder. So using 4 bytes per code point doesn't make it easier to e.g. reverse a string properly, just easier to claim you've done enough and that you won't support combining characters.Asberry
@Steve Jessop: Sure, but removing deprecated features usually doesn't affect the compilers much, since implementers usually keep them in for backward compatibility. About the only way to remove old features is to overwrite them: no C++0x-compliant compiler can have the old meaning for auto, for example.Springfield
@David, deprecated features are normative (see annex D of either standard). That means compiler writers are required to implement them. Here is real world example that illustrates this. The EDG compiler is only one to support exporting templates. The committee wanted to deprecated them. EDG asked that they just be removed instead so that they don't have to continue to support it. As EDG was the only compiler with a working implementation that is what the committee did, exported templates are not deprecated in c++0x, they just are not there.Consolatory
@deft_code: Sure, but if a few compilers had supported exported templates, and they had customers that used the feature, would they remove it because it wasn't in C++0x? Deprecation should be a signal (not completely accurate) not to use that feature, because it might not be in future standards. (As you say, a conforming implementation must implement them.) Programmers will normally still use deprecated features, software written with them will hang around unchanged, and a compiler vendor will feel compelled to support them even when removed from the standard.Springfield
H
9

Nothing happens to wchar_t, it is still implementation specific (and compatible with C).

The new types char16_t and char32_t have defined semantics in the new standard. The old wchar_t might be equivalent to one of those, but likely to a different one on different implementations. Or none of them, on some systems.

You will have typedefs u16string and u32string for strings of the new character types, but no new standard streams.

Hankins answered 13/5, 2011 at 21:20 Comment(3)
Can you confirm that std::string should contain utf8 chars? Or is there another type for this? u8string?Prosecution
There is no u8string. char has an overloaded meaning of "UTF-8 code unit", "member of the basic execution character set", or "byte".Photoelectric
@Prosecution - Like Dan says, std::string could (not should) contain UTF-8. It is up to the application to decide the interpretation. The language already has three narrow character types, and the committee was hesitant to add a fourth!Hankins

© 2022 - 2024 — McMap. All rights reserved.