Print wchar to Linux console?
Asked Answered
D

3

8

My C program is pasted below. In bash, the program print "char is ", Ω is not printed. My locale are all en_US.utf8.

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>

int main() {
   int r;
   wchar_t myChar1 = L'Ω';
   r = wprintf(L"char is %c\n", myChar1);
}
Defroster answered 8/10, 2011 at 8:10 Comment(0)
T
15

This was quite interesting. Apparently the compiler translates the omega from UTF-8 to UNICODE but somehow the libc messes it up.

First of all: the %c-format specifier expects a char (even in the wprintf-version) so you have to specify %lc (and therefore %ls for strings).

Secondly if you run your code like that the locale is set to C (it isn't automatically taken from the environment). You have to call setlocale with an empty string to take the locale from the environment, so the libc is happy again.

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>

int main() {
    int r;
    wchar_t myChar1 = L'Ω';
    setlocale(LC_CTYPE, "");
    r = wprintf(L"char is %lc (%x)\n", myChar1, myChar1);
}
Troublous answered 8/10, 2011 at 9:52 Comment(2)
Actually, this is as intended. libc doesn't mess up, it just follows the standard.Kaleena
Thanks, but can you give some details? What do you mean locale is set to 'C' ?Allonym
I
6

Alternatively to the answer suggesting fixing LIBC, you can do this:

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>

// NOTE: *NOT* thread safe, not re-entrant
const char* unicode_to_utf8(wchar_t c)
{
    static unsigned char b_static[5];
    unsigned char* b = b_static; 

    if (c<(1<<7))// 7 bit Unicode encoded as plain ascii
    {
        *b++ = (unsigned char)(c);
    }
    else if (c<(1<<11))// 11 bit Unicode encoded in 2 UTF-8 bytes
    {
        *b++ = (unsigned char)((c>>6)|0xC0);
        *b++ = (unsigned char)((c&0x3F)|0x80);
    }
    else if (c<(1<<16))// 16 bit Unicode encoded in 3 UTF-8 bytes
        {
        *b++ = (unsigned char)(((c>>12))|0xE0);
        *b++ =  (unsigned char)(((c>>6)&0x3F)|0x80);
        *b++ =  (unsigned char)((c&0x3F)|0x80);
    }

    else if (c<(1<<21))// 21 bit Unicode encoded in 4 UTF-8 bytes
    {
        *b++ = (unsigned char)(((c>>18))|0xF0);
        *b++ = (unsigned char)(((c>>12)&0x3F)|0x80);
        *b++ = (unsigned char)(((c>>6)&0x3F)|0x80);
        *b++ = (unsigned char)((c&0x3F)|0x80);
    }
    *b = '\0';
    return b_static;
}


int main() {
    int r;
    wchar_t myChar1 = L'Ω';
    r = printf("char is %s\n", unicode_to_utf8(myChar1));
    return 0;
}
Intestate answered 8/10, 2011 at 10:15 Comment(2)
This answer is silly; the only point of using wchar_t to begin with is that you could theoretically support different output encodings in different locales. If you want to hard-code UTF-8, just use char *myChar1 = "Ω"; and then printf with %s...Seesaw
I treat my answer as a workaround or possibly a solution in some more limited use cases. I like the answer which was chosen as a solution so no argument here. Cheers.Intestate
N
4

Use {glib,libiconv,ICU} to convert it to UTF-8 before outputting.

Novella answered 8/10, 2011 at 8:13 Comment(1)
Thank you. Can I do it by not using these libs?Defroster

© 2022 - 2024 — McMap. All rights reserved.