C++: wide characters outputting incorrectly?
Asked Answered
O

1

9

My code is basically this:

wstring japan = L"日本";
wstring message = L"Welcome! Japan is ";

message += japan;

wprintf(message.c_str());

I'm wishing to use wide strings but I do not know how they're outputted, so I used wprintf. When I run something such as:

./widestr | hexdump

The hexidecimal codepoints create this:

65 57 63 6c 6d 6f 21 65 4a 20 70 61 6e 61 69 20 20 73 3f 3f
e  W  c  l  m  o  !  e  J     p  a  n  a  i        s  ?  ?

Why are they all jumped in order? I mean if the wprintf is wrong I still don't get why it'd output in such a specific jumbled order!

edit: endianness or something? they seem to rotate each two characters. huh.

EDIT 2: I tried using wcout, but it outputs the exact same hexidecimal codepoints. Weird!

Oswell answered 28/6, 2010 at 6:13 Comment(4)
Maybe you should try cout << message << endl.Martineau
@phimuemue, It does not work, it sends me roughly 30 errors, first being widestr.cpp:18: error: no match for ‘operator<<’ in ‘std::cout << message‘, including many about ostream char traits or something, It won't output the wide string!Oswell
What platform and compiler are you using?Aspidistra
I'm on GCC / Linux 2.6 (x86).Oswell
S
15

You need to define locale

    #include <stdio.h>
    #include <string>
    #include <locale>
    #include <iostream>

    using namespace std;

    int main()
    {

            std::locale::global(std::locale(""));
            wstring japan = L"日本";
            wstring message = L"Welcome! Japan is ";

            message += japan;

            wprintf(message.c_str());
            wcout << message << endl;
    }

Works as expected (i.e. convert wide string to narrow UTF-8 and print it).

When you define global locale to "" - you set system locale (and if it is UTF-8 it would be printed out as UTF-8 - i.e. wstring will be converted)

Edit: forget what I said about sync_with_stdio -- this is not correct, they are synchronized by default. Not needed.

Santossantosdumont answered 28/6, 2010 at 7:26 Comment(9)
You make it sound like sync_with_stdio and wcout are alternatives; they do completely different things. sync_with_stdio is required if you want to interleave C stream functions (like wprintf) with C++ stream usage (wcout); imbue is needed if you want to change the locale used by wcout.Grunion
I can't test it, but wcout should work without codepage settings on Windows because wchar_t is a UTF-16 code unit on Windows and UTF-16 is Windows's only native encoding. So std::wcout should use WriteConsoleW without any locale conversion. If it doesn't, it's a library bug.Amimia
@Amimia It is not how this is defined by standard. Standard says that wide characters should be converted to narrow encoding according to locale's codepage. And this is what is done. The issue with Windows is that it does not support UTF-8. So for Windows you probably need to use locale::globale(locale("Japan")) and it would use Shift-JIS encoding in output. Otherwise it would fail to convert characters.Santossantosdumont
microsofts standard libraries wcout implementation uses the global c-locale internally, so imbueing a locale is practically useless. You have to set the desired locale as global locale...Ac
@Artyom: Thanks for the comment. This means that std::wcout is essentially useless on Windows. I'd consider this to be a mistake in the C++ standard that is unnecessarily biased towards Unix. BTW, Windows consoles do support UTF-8 (via SetConsoleCodePage), but all code pages are obsolete and only kept for compatibility reasons. Shift-JIS is even more obsolete than UTF-8 because it's not a Unicode encoding. So it seems that one really has to call WriteConsoleW directly.Amimia
regarding my comment: this is only true for the ctype facet, imbueing a locale works for all other facets AFAIKAc
@Artyom, , @others thanks, it helped me learn an annoying part of the language. Works fine now.Oswell
output of "wprintf" is as expected, but "wcout" doesn't show any output (all blank). why so?Dorser
What about for japanese characters like L"{\"type\":\"string\",\"value\":\"\\u9CE5\"},\n" . It doesn't seem to wprintf or wcout like you show above.Cohin

© 2022 - 2024 — McMap. All rights reserved.