How to display L"أَبْجَدِيَّة عَرَبِيَّة‎中文" using wcout?
Asked Answered
F

6

10

I want to display an Arabic message mixed with Chinese using wcout.

The following code is OK:

#include <iostream>

using namespace std;

int main()
{
    wcout.imbue(locale("chs"));
    wcout << L"中文"; // OK
}

However, the following code doesn't work:

#include <iostream>

using namespace std;

int main()
{
    wcout.imbue(locale(/* What to place here ??? */));
    wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文"; // Output nothing. VC++ 2012 on Win7 x64
    // Why does the main advantage of unicode not apply here?
}

I think the concept of code pages should be deprecated after the adoption of unicode.

Q1. What's the mechanism of wout's displaying such a text?

Q2. Why does Windows, as a unicode-based OS, not support outputting unicode characters in its console window?

Fibroblast answered 4/2, 2013 at 16:34 Comment(17)
What problems do you have with the code above?Khiva
wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文"; // The output is not as expected. VC++ 2012Fibroblast
Maybe take a look hereKhiva
It is implementation-defined (as everything that has to do with string literals that go beyond the basic execution character set).Solomonsolon
@Andy, Windows 7 64-bitFibroblast
@xmllmx: maybe this might help you then?Horizon
Please, could you explain more your goals? Do you want create console application which produce UNICODE output in the mix of languages or you want to display the output of the application in the Windows Console? Is C++ functions are important for you or the application can use Windows API, no C++ classes and be written in pure C? Is redirection of the output of the application in the file also important for you?Teheran
@Oleg, I just wonder how to produce UNICODE output in the mix of languages in pure C++. The Windows API based solutions are not what I want. I want to know the mechanism of how such a unicode string is outputted to console.Fibroblast
Probably you don't clear understand the steps of solution. First you need to create console application. The implementation of console application is different on different target OS. If you want create console application running under Windows you should first understand how one can implement it without the problems of configuration of Windows, configuration of console (like chcp 65001), usage of correct fonts and without the problems which is specific with the usage some C++ libraries.Teheran
Just try to execute the code _setmode(_fileno(stdout), _O_WTEXT); std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文Русский" << std::endl; for example, but call chcp 65001 in the console before starting of your application. Then start it with option >%temp%\t.txt to redirect the results in file and open >%temp%\t.txt file in Notepad. You will see the text "أَبْجَدِيَّة عَرَبِيَّة‎中文Русский" correctly.Teheran
@Oleg, notepad.exe is no problem. The problem remains in console.Fibroblast
What is your real problem? Do you want to display the results on your computer or you want create console application which should display the same information on every Windows computer? The last one is not possible. If you create application which use other people you should consider to you not console applications. Console application were interpreted as legacy application even in the time on Windows NT 3.1 (for more as 20 years). In the main design goal was the compatibility with old application. It's the reason of usage code pages existing in more early world.Teheran
This is likely a mission impossible. Thanks, Oleg.Fibroblast
@xmllmx: You are welcome! Sorry, for bad news, but usage of _setmode with _O_U16TEXT, _O_U8TEXT or _O_WTEXT is really enough to enable Unicode mode in the console application. To be able to see the results one have to use UNICODE code page (execute chcp 65001 in the cmd). The last requirement is to use the Font in the console which can display the results. The last requirement is the most complex for common computer, so the only safe way will be piping the results to the file which really helpful only in seldom scenarios.Teheran
Maybe a console font does not supported your characters.Sharp
system("chcp 65001"); system("chcp 936");Friel
** C++, 2023 still has no a easy way to use std::wcout output the ** Chinese.Tenia
L
5

CRT would treat all output to files as ANSI by default. You can change that with this line at the start of your program

_setmode(_fileno(stdout), _O_WTEXT);

A good reference @ http://www.siao2.com/2008/03/18/8306597.aspx

Just for reference bidirectional language support is limited in most command prompts and from what I understand that is the limitation causing this issue here. The why it is not/supported is something that I cannot answer.

Lalise answered 4/2, 2013 at 16:51 Comment(10)
@Fibroblast Might be your font. Tried with Courier New and I see the Arabic but not the Chinese.Recrudescence
@xmllmx: can you type Arabic characters in the console?Solomonsolon
Try just copying and pasting the Arabic text into the shell. That'll tell you if the font supports it.Yasmeen
first thing to check is to see if you redirect it to a file and get the desired output. if you do then it is most probably a character encoding limitation on the cmd prompt itself. There are many references on stackoverflow on the same.Lalise
That must be the reason. Your console font probably does not support them.Solomonsolon
@allen, Why does Windows, as a unicode-based OS, not support outputting unicode characters in its console window?Fibroblast
@xmllmx: That is not a constructive question.Borek
@Fibroblast Am sure you can easily find blogs about this but honestly I dont know the exact reasoning to answer it anyways (above my pay grade too :) )Lalise
This is not so much about Unicode, but about Bidi.Gravimetric
As far as I know, the console don't use Uniscribe to output text. Arabic text will not show up correctly without proper contextual shaping. It's sort of connect to bidi, but not quite. Other complex scripts like Hindi (which is left-to-right) won't work either.Carola
L
4

You cannot portably print wide strings using standard C++ facilities.

Instead you can use the open-source {fmt} library to portably print Unicode text. For example (https://godbolt.org/z/nccb6j):

#include <fmt/core.h>

int main() {
  fmt::print("أَبْجَدِيَّة عَرَبِيَّة‎中文");
}

prints

أَبْجَدِيَّة عَرَبِيَّة‎中文

This requires compiling with the /utf-8 compiler option in MSVC.

For comparison, writing to wcout on Linux (https://godbolt.org/z/h9WKsY):

std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文";

prints

???????????? ?????????????

unless you switch the global locale to e.g. en_US.utf8. Similar issue exists on Windows with no standard way to fix it (you have to use non-standard CRT functions or Windows API).

Disclaimer: I'm the author of {fmt}.

Lineament answered 29/12, 2020 at 23:28 Comment(1)
C++ 20, std::wprint(L"中文") still output an empty string, I'm not sure does <format> equals to <fmt>.Tenia
B
2

I just read this article

"To the summary...

If you use Visual C++ you can't use UTF-8 to print text to std::cout.

If you still want to, please read this amazingly long article about how to make wcout and cout working, but it does not really give a simple solution - finally falling to redefinition of the stream buffers..." http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/

(from this blog http://blog.cppcms.com/post/105)

Basophil answered 10/2, 2013 at 0:55 Comment(0)
C
1

You can try this:

I assume that you were able to render Chinese only text. That signifies that you have chinese font files.

You please try with arabic only text. If you are able to render, that signifies that you have arabic font in your system.

But when you mix this, arabic + chinese, then you need to force to pick a font file which has both glyph sets. I think the default font file picked up by wcout doesnt have the arabic glyphs.

I assume that you may be getting boxes for arabic unicodes.

Chiro answered 14/2, 2013 at 19:55 Comment(0)
V
1
#include <iostream>
#include <io.h>
#include <fcntl.h>

int main() {
    _setmode(_fileno(stdout), _O_U16TEXT); // or _O_WTEXT
    std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文" << std::endl;
}

http://www.cplusplus.com/forum/beginner/126557/

Varicose answered 3/11, 2017 at 3:43 Comment(0)
O
0

On Windows

I recommend redirecting the wcout buffer to a file to facilitate viewing the results, because the Windows command prompt unable to display some unicode fonts.
#include <iostream>
#include <fstream>                                                                                                                     

int main()
{
    std::locale myloc("en_US.UTF-8");
    std::locale::global(myloc);                                              

    std::wfilebuf wfbuf;
    wfbuf.open("result.txt", std::ios::out);
    std::wcout.rdbuf(std::addressof(wfbuf));   
                                                
    std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文";   
 
    return 0;
}

On Linux

Method1

#include <iostream>                                                                                                                        
                       
int main()
{
    std::ios::sync_with_stdio(false);// make wcout no longer depend on stdio
    std::locale myloc("en_US.UTF-8");                                           
    std::wcout.imbue(myloc);                                                    
    std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文";   
 
    return 0;
}

Method2

#include <cstdio>                                                                                                                        
                       
int main()
{
    std::locale myloc("en_US.UTF-8");                                           
    std::locale::global(myloc);// can affect stdio's locale                                               
    wprintf(L"أَبْجَدِيَّة عَرَبِيَّة‎中文");  
 
    return 0;
}

Method3

#include <cstdio>                                                                                                                        
                       
int main()
{
    std::locale myloc("en_US.UTF-8");                                           
    std::locale::global(myloc);// can affect stdio's locale                                                                 
    std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文";// wcout depend on stdio, it doesn't matter if wcout's locale still C locale.

    return 0;
}
Ofilia answered 28/12, 2022 at 19:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.