iostreams - Print `wchar_t` or `charXX_t` value as a character
Asked Answered
D

1

7

If you feed a wchar_t, char16_t, or char32_t value to a narrow ostream, it will print the numeric value of the code point.

#include <iostream>
using std::cout;
int main()
{
    cout << 'x' << L'x' << u'x' << U'x' << '\n';
}

prints x120120120. This is because there is an operator<< for the specific combination of basic_ostream with its charT, but there aren't analogous operators for the other character types, so they get silently converted to int and printed that way. Similarly, non-narrow string literals (L"x", u"x", U"X") will be silently converted to void* and printed as the pointer value, and non-narrow string objects (wstring, u16string, u32string) won't even compile.

So, the question: What is the least awful way to print a wchar_t, char16_t, or char32_t value on a narrow ostream, as the character, rather than as the numeric value of the codepoint? It should correctly convert all codepoints that are representable in the encoding of the ostream, to that encoding, and should report an error when the codepoint is not representable. (For instance, given u'…' and a UTF-8 ostream, the three-byte sequence 0xE2 0x80 0xA6 should be written to the stream; but given u'â' and a KOI8-R ostream, an error should be reported.)

Similarly, how can one print a non-narrow C-string or string object on a narrow ostream, converting to the output encoding?

If this can't be done within ISO C++11, I'll take platform-specific answers.

(Inspired by this question.)

Defeat answered 12/12, 2016 at 18:58 Comment(1)
In short, you have to either 1) use a wide ostream, or 2) convert the wide character data to the narrow encoding yourself (which is a potentially lossy conversion). An ostream cannot do that conversion for you. Look at std::wstring_convert, or use a library like ICONV or ICU.Metabolize
H
3

As you noted, there is no operator<<(std::ostream&, const wchar_t) for a narrow ostream. If you want to use the syntax you can however teach ostream how to do with wchars so that that routine is picked as a better overload that the one requiring a conversion to an integer first.

If you're feeling adventurous:

namespace std {
  ostream& operator<< (ostream& os, wchar_t wc) {
    if(unsigned(wc) < 256) // or another upper bound
      return os << (unsigned char)wc;
    else
      throw your_favourite_exception; // or handle the error in some other way
  }
}

Otherwise, make a simple struct that transparently encompasses a wchar_t and has a custom friend operator<< and convert your wide characters to that before outputting them.

Edit: To make an on-the-fly conversion to and from the locale, you can use the functions from <cwchar>, like:

ostream& operator<< (ostream& os, wchar_t wc) {
    std::mbstate_t state{};
    std::string mb(MB_CUR_MAX, '\0');
    size_t ret = std::wcrtomb(&mb[0], wc, &state);
    if(ret == static_cast<std::size_t>(-1))
        deal_with_the_error();
    return os << mb;
}

Don't forget to set your locale to the system default:

std::locale::global(std::locale(""));
std::cout << L'ŭ';
Hesione answered 12/12, 2016 at 20:40 Comment(7)
This does not convert the value to the narrow output encoding. That's essential, and it's also the piece that I don't already know how to do.Defeat
@Defeat How else would you like to convert a wide character than accepting it if it's within ASCII and rejecting otherwise? You'd then need to be specific, e.g., removing accents or something.Hesione
Your example uses an 'x' which passes this (for the L'x', you'd need to do the same for the other types) so I assumed that's what you're after.Hesione
It should, for instance, convert L"…" to the three-byte sequence 0xE2 0x80 0xA6 when the narrow output encoding is UTF-8.Defeat
I thought it was obvious that I wanted something that could handle all characters supported by the narrow output encoding, not just ASCII.Defeat
I see! I thought the output encoding was ASCII. iconv is not hard to use, I'll try to work it in.Hesione
@Defeat Please see updated, there was a better way after all.Hesione

© 2022 - 2024 — McMap. All rights reserved.