Output Unicode to console Using C++, in Windows
Asked Answered
N

5

33

I'm still learning C++, so bear with me and my sloppy code. The compiler I use is Dev C++. I want to be able to output Unicode characters to the Console using cout. Whenver i try things like:

#include <iostream>

int main()
{
    std::cout << "Hello World!\n";
    std::cout << "Blah blah blah some gibberish unicode: ĐĄßĞĝ\n";
    system("PAUSE");
    return 0;
}

It outputs strange characters to the console, like µA■Gg. Why does it do that, and how can I get to to display ĐĄßĞĝ? Or is this not possible with Windows?

Nondescript answered 17/5, 2010 at 12:33 Comment(10)
just a comment: don't use system("pause"), its very bad practice. you can use cin instead. gidnetwork.com/b-61.htmlChorea
Duplicate? #2492577Huesman
Oh god, how do people DO it? How come every newbie is magnetically attracted to Dev C++? That piece of junk was buggy 5 years ago, and guess what? It still is today, because it hasn't been maintained since then. There are so many good free compilers and IDEs. Why oh why do beginners insist on picking the only one that is absolute crap, lacks basic features, never worked, and is buggy as hell and comes with a prehistoric compiler by default?Fernandafernande
@jalf: your rant would be more useful if you linked to one such good, free compiler and IDE.Roentgen
@nmuntz: I agree about system("pause"); but the article you link to is just as bad. For one thing, just cin.get() does not usually suffice. Pausing does a whole lot more, most prominently cleaning the input buffer. Doing that in a portable, reliable way in C++ is extremely hard. In fact, the two solutions I know (ignore 1– cin.rdbuf()->in_avail(), 2– numeric_limits<streamsize>::max()) fail on different current compilers (they compile but don’t work). The rest of the linked page is a straw-man argument. Who cares that pausing is costly? It’s only called once!Perspicacity
@Joachim: Fair enough. Microsoft has Visual C++ Express, which includes an excellent compiler and IDE for free. That is pretty much the de facto standard for Windows C++ development. GCC is a top-notch cross-platform compiler, and is often used with the Code::Blocks or Eclipse IDEs.Fernandafernande
@JoachimSauer : yes, including everything said by jalf, I actually use MinGW in Windows... works like a charm.Rarefy
I answered a very similar questions just a few days ago. It is very detailed and with example: Unicode on Console - Chinese Characters Although my answer focuses on Windows 10 back to Vista, but it is already 2017 now.Mcgruder
Possible duplicate of Output unicode strings in Windows console appPutty
Check my answer on this post #2492577Bjorn
A
22

What about std::wcout ?

#include <iostream>

int main() {
    std::wcout << L"Hello World!" << std::endl;
    return 0;
}

This is the standard wide-characters output stream.

Still, as Adrian pointed out, this doesn't address the fact cmd, by default, doesn't handle Unicode outputs. This can be addressed by manually configuring the console, like described in Adrian's answer:

  • Starting cmd with the /u argument;
  • Calling chcp 65001 to change the output format;
  • And setting a unicode font in the console (like Lucida Console Unicode).

You can also try to use _setmode(_fileno(stdout), _O_U16TEXT);, which require fcntl.h and io.h (as described in this answer, and documented in this blog post).

Avocado answered 17/5, 2010 at 12:38 Comment(4)
This doesn't address the fact that the console is typically in ANSI or OEM mode.Infatuate
This is mostly right but... cmd does handle Unicode output by default to the console but not when redirected to a file. Use /u for it to also output Unicode to redirected files. In both cases "Unicode" means UTF-16 as per usual on Windows. chcp 65001 sets the ANSI codepage to UTF-8 which is unlreated to wide characters, wcout, and cmd /u. You do not need to set the codepage to UTF-8 to output UTF16!! Furthermore the WriteFile() API is broken under chcp 65001. The _setmode() call is important and required if you want to output characters beyond your ANSI codepage!Ichnography
@Adrian: The console does not have an ANSI or OEM mode. It ony has an ANSI codepage which by default is an OEM codepage such 437 or 850. But you do not have to print via this codepage. All Windows text APIs have an A version and a W version. A for ANSI which goes through the codepage, W for "wide" which does not go through the codepage but deals directly in UTF-16 Unicode. Both are always present without a requirement or even a possibility of switching a "mode".Ichnography
+1 for your suggestion to set a "unicode font in the console". That was the missing piece for me. I thought that doing chcp 65001 alone would enable a unicode font.Lisa
S
10

You can use the open-source {fmt} library to portably print Unicode text, including on Windows, for example:

#include <fmt/core.h>

int main() {
  fmt::print("Blah blah blah some gibberish unicode: ĐĄßĞĝ\n");
}

Output:

Blah blah blah some gibberish unicode: ĐĄßĞĝ

This requires compiling with the /utf-8 compiler option in MSVC.

I don't recommend using wcout because it is non-portable, for example:

std::wcout << L"Blah blah blah some gibberish unicode: ĐĄßĞĝ\n";

will print the ĐĄßĞĝ part incorrectly on macOS or Linux (https://godbolt.org/z/z81jbb):

Blah blah blah some gibberish unicode: ??ss??

and doesn't even work on Windows without changing the code page:

Blah blah blah some gibberish unicode:

Disclaimer: I'm the author of {fmt}.

Susurrous answered 28/12, 2020 at 4:13 Comment(5)
What does /utf-8 do? Isn't Windows UTF-16 internally? Isn't it inefficient to convert to UTF-16 at runtime? The whole Win32 is UTF-16, how do you get around that?Florri
From the documentation (learn.microsoft.com/en-us/cpp/build/reference/…) I read that if you don't specify /utf-8 it will use the user locale code page, which will mean your program might display differently based on the locale settings of the user who compiled the program and maybe also on the locale during execution (yikes!) I don't use windows any more, so take this with a grain of salt, but the link might help you.Darned
/utf-8 sets source and execution encoding to UTF-8. Technically it's unnecessary but {fmt} won't do transcoding otherwise in case you are using a legacy encoding. Transcoding only happens when writing to console and is negligible compared to the time it takes to render the text. When the output is redirected there is no transcoding which is another advantage of {fmt} compared to wcout.Susurrous
@Susurrous Thank you for the extensive comment. A bit of a provoking question: In that case, why not make UTF-8 default in {fmt} and let people who are dealing with legacy encoding go the extra mile (when switching to {fmt} presumably) instead of literally everyone else?Florri
@AyxanHaqverdili, good question. I'm considering switching the default to UTF-8 in the next major version with the opt-out to the old behavior.Susurrous
I
7

I'm not sure Windows XP will fully support what you need. There are three things you have to do to enable Unicode with a command console:

  1. Start the command window with cmd /u. The /u says your programs will output Unicode.
  2. Use chcp 65001 to indicate you want to use UTF-8 instead of one of the code pages.
  3. Select a font with more glyph coverage. The command windows in newer versions of Windows offer Lucida Console Unicode. My XP box has a subset of that called Lucida Console. It doesn't have a very extensive repertoire, but it should be sufficient if you're just trying to display some accented characters.
Infatuate answered 17/5, 2010 at 13:28 Comment(2)
+1 for use chcp 65001 - this does the trick. (from cmd /? : /U Causes the output of internal commands to a pipe or file to be Unicode.)Incontrovertible
1. /u only means that built in commands will output UTF-16 when redirected rather than ANSI. It means nothing for your own code or for output that is not redirected. 2. chcp 65001 does not work properly with UTF-8 console output due to a bug in the WriteFile() API which causes it to return the wrong value. This API is called by the standard C library functions such as printf() and any of them which check the return code may fail or result in unpredictable behaviour. 3. The font advice is correct and is a silly failing of Windows IMHO.Ichnography
S
0

You used the ANSI output stream. You need to use

std::wcout << L"Blah blah blah some gibberish unicode: ĐĄßĞĝ\n";

Also, use std::cin.get(), not system("PAUSE")

Sighted answered 17/5, 2010 at 12:42 Comment(2)
Thanks for the tip about cin.get(). I know using system("PAUSE"); is a bad habit, but Dev C++ didn't support anything else I used. Also, wcout isn't recognized by Dev C++. I think I'll follow the advice in the other answers/comments and switch to Visual Studio. I experience fewer problems with that IDE.Nondescript
It should also be noted that system("PAUSE") is not portable whereas cin.get() is.Hummocky
S
-1

In Linux, I can naively do:

std::cout << "ΐ , Α, Β, Γ, Δ, ,Θ , Λ, Ξ, ... ±, ... etc";

and it worked for most of the characters I tried.

Sedda answered 9/1, 2017 at 10:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.