Print char32_t to console
Asked Answered
S

2

5

How can I print (cout / wcout / ...) char32_t to console in C++11?

The following code prints hex values:

u32string s2 = U"Добрый день";
for(auto x:s2){
    wcout<<(char32_t)x<<endl;
}
Schafer answered 7/4, 2013 at 0:57 Comment(4)
I would like to have a OS independend solution (if it is possible). I'm on Linux x86_64.Schafer
Impossible. Won't work on Windows without major hacks. Also, I don't think wcout is supposed to print as characters anything but char and wchar_t. char32_t is neither.Forego
oh:( But is it possible to write u32 encoded files on all platforms? So if I want to to print anything "readable" with *cout I've got to convert these characters to utf8, if it is possible?Schafer
Encoding and converting isn't a problem, you can always do it. The problem of getting readable text is in having and being able to select proper Unicode fonts. Windows' console is quite broken in this respect. See this answer of mine and follow the link(s) in it. As for your example, see this.Forego
F
5

First, I don't think wcout is supposed to print as characters anything but char and wchar_t. char32_t is neither.

Here's a sample program that prints individual wchar_t's:

#include <iostream>

using namespace std;

int main()
{
  wcout << (wchar_t)0x41 << endl;
  return 0;
}

Output (ideone):

A

Currently, it's impossible to get consistent Unicode output in the console even in major OSes. Simplistic Unicode text output via cout, wcout, printf(), wprintf() and the like won't work on Windows without major hacks. The problem of getting readable Unicode text in the Windows console is in having and being able to select proper Unicode fonts. Windows' console is quite broken in this respect. See this answer of mine and follow the link(s) in it.

Forego answered 7/4, 2013 at 1:29 Comment(0)
B
5

I know this is very old, but I had to solve it on my own and there you go. The idea is to switch between UTF-8 and UTF-32 encodings of Unicode: you can cout u8 strings, so just translate the UTF-32 encoded char32_t to it and you're done. Those are the low level functions I came up with (no Modern C++). Probably those can be optimized, also: any suggestion is appreciated.

char* char_utf32_to_utf8(char32_t utf32, const char* buffer)
// Encodes the UTF-32 encoded char into a UTF-8 string. 
// Stores the result in the buffer and returns the position 
// of the end of the buffer
// (unchecked access, be sure to provide a buffer that is big enough)
{
    char* end = const_cast<char*>(buffer);
    if(utf32 < 0x7F) *(end++) = static_cast<unsigned>(utf32);
    else if(utf32 < 0x7FF) {
        *(end++) = 0b1100'0000 + static_cast<unsigned>(utf32 >> 6);
        *(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
    }
    else if(utf32 < 0x10000){
        *(end++) = 0b1110'0000 + static_cast<unsigned>(utf32 >> 12);
        *(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 6) & 0b0011'1111);
        *(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
    } else if(utf32 < 0x110000) {
        *(end++) = 0b1111'0000 + static_cast<unsigned>(utf32 >> 18);
        *(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 12) & 0b0011'1111);
        *(end++) = 0b1000'0000 + static_cast<unsigned>((utf32 >> 6) & 0b0011'1111);
        *(end++) = 0b1000'0000 + static_cast<unsigned>(utf32 & 0b0011'1111);
    }
    else throw encoding_error(end);
    *end = '\0';
    return end;
}

You can implement this function in a class if you want, in a constructor, in a template, or whatever you prefer.

Follows the overloaded operator with the char array

std::ostream& operator<<(std::ostream& os, const char32_t* s)
{
    const char buffer[5] {0}; // That's the famous "big-enough buffer"
    while(s && *s)
    {
        char_utf32_to_utf8(*(s++), buffer);
        os << buffer;
    }
    return os;
}

and with the u32string

std::ostream& operator<<(std::ostream& os, const std::u32string& s)
{
    return (os << s.c_str());
}

Running the simplest stupidest test with the Unicode characters found on Wikipedia

int main()
{
    std::cout << std::u32string(U"\x10437\x20AC") << std::endl;
}

leads to 𐐷€ printed on the (Linux) console. This should be tested with different Unicode characters, though...

Also this varies with endianness but I'm sure you can find the solution looking at this.

Buzzell answered 19/6, 2019 at 22:44 Comment(2)
"you can cout u8 strings" depends on whether the environent displaying the output supports UTF8. The C++ standard library has char32_t to UTF8 conversion so you are reinventing the wheel there. Also you should not try to overload operator<< this way as it will not be found by ADLReamy
1. I was answering to a Linux-specific question and usually a Linux environment supports UTF-8 2. The answer wasn't supposed to explain codecvt or other std facilities, neither to invent something new. Only to answer a question unanswered in 6 years with non-std low-level code as written in the answer as well 3. The answer was just explanatory of an idea hundreds of people probably had already before me and doesn't involve anything related to code management and namespacing. Nevertheless there is an easy solution to the problem that you highlighted which I am pretty sure you're well aware ofBuzzell

© 2022 - 2024 — McMap. All rights reserved.