How can I print (cout
/ wcout
/ ...) char32_t to console in C++11?
The following code prints hex values:
u32string s2 = U"Добрый день";
for(auto x:s2){
wcout<<(char32_t)x<<endl;
}
How can I print (cout
/ wcout
/ ...) char32_t to console in C++11?
The following code prints hex values:
u32string s2 = U"Добрый день";
for(auto x:s2){
wcout<<(char32_t)x<<endl;
}
First, I don't think wcout
is supposed to print as characters anything but char
and wchar_t
. char32_t
is neither.
Here's a sample program that prints individual wchar_t's
:
#include <iostream>
using namespace std;
int main()
{
wcout << (wchar_t)0x41 << endl;
return 0;
}
Output (ideone):
A
Currently, it's impossible to get consistent Unicode output in the console even in major OSes. Simplistic Unicode text output via cout
, wcout
, printf()
, wprintf()
and the like won't work on Windows without major hacks. The problem of getting readable Unicode text in the Windows console is in having and being able to select proper Unicode fonts. Windows' console is quite broken in this respect. See this answer of mine and follow the link(s) in it.
I know this is very old, but I had to solve it on my own and there you go. The idea is to switch between UTF-8 and UTF-32 encodings of Unicode: you can cout u8 strings, so just translate the UTF-32 encoded char32_t to it and you're done. Those are the low level functions I came up with (no Modern C++). Probably those can be optimized, also: any suggestion is appreciated.
char* char_utf32_to_utf8(char32_t utf32, const char* buffer)
// Encodes the UTF-32 encoded char into a UTF-8 string.
// Stores the result in the buffer and returns the position
// of the end of the buffer
// (unchecked access, be sure to provide a buffer that is big enough)
{
char* end = const_cast<char*>(buffer);
if(utf32 < 0x7F) *(end++) = static_cast<unsigned>(utf32);
else if(utf32 < 0x7FF) {
*(end++) = 0b1100'0000 + static_cast<unsigned>(utf32 >> 6);
*(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
}
else if(utf32 < 0x10000){
*(end++) = 0b1110'0000 + static_cast<unsigned>(utf32 >> 12);
*(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 6) & 0b0011'1111);
*(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
} else if(utf32 < 0x110000) {
*(end++) = 0b1111'0000 + static_cast<unsigned>(utf32 >> 18);
*(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 12) & 0b0011'1111);
*(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 6) & 0b0011'1111);
*(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
}
else throw encoding_error(end);
*end = '\0';
return end;
}
You can implement this function in a class if you want, in a constructor, in a template, or whatever you prefer.
Follows the overloaded operator with the char array
std::ostream& operator<<(std::ostream& os, const char32_t* s)
{
const char buffer[5] {0}; // That's the famous "big-enough buffer"
while(s && *s)
{
char_utf32_to_utf8(*(s++), buffer);
os << buffer;
}
return os;
}
and with the u32string
std::ostream& operator<<(std::ostream& os, const std::u32string& s)
{
return (os << s.c_str());
}
Running the simplest stupidest test with the Unicode characters found on Wikipedia
int main()
{
std::cout << std::u32string(U"\x10437\x20AC") << std::endl;
}
leads to 𐐷€
printed on the (Linux) console. This should be tested with different Unicode characters, though...
Also this varies with endianness but I'm sure you can find the solution looking at this.
char32_t
to UTF8 conversion so you are reinventing the wheel there. Also you should not try to overload operator<<
this way as it will not be found by ADL –
Reamy © 2022 - 2024 — McMap. All rights reserved.
wcout
is supposed to print as characters anything butchar
andwchar_t
.char32_t
is neither. – Forego*cout
I've got to convert these characters to utf8, if it is possible? – Schafer