I have been exploring C++11's new Unicode functionality, and while other C++11 encoding questions have been very helpful, I have a question about the following code snippet from cppreference. The code writes and then immediately reads a text file saved with UTF-8 encoding.
// Write
std::ofstream("text.txt") << u8"z\u6c34\U0001d10b";
// Read
std::wifstream file1("text.txt");
file1.imbue(std::locale("en_US.UTF8"));
std::cout << "Normal read from file (using default UTF-8/UTF-32 codecvt)\n";
for(wchar_t c; file1 >> c; ) // ?
std::cout << std::hex << std::showbase << c << '\n';
My question is quite simply, why is a wchar_t
needed in the for
loop? A u8
string literal can be declared using a simple char *
and the bit layout of the UTF-8 encoding should tell the system the character's width. It appears there is some automatic conversion from UTF-8 to UTF-32 (hence the wchar_t
), but if this is the case, why is the conversion necessary?
wchar_t
is used becausewifstream
is used, andwifstream
performs that "some automatic conversion" you mention. My point was to show the difference between that automatic conversion (as implemented for one particular platform) and the explicit, portable, locale-independent, Unicode conversion provided bycodecvt_utf8_utf16
. – Gradient