Why is © (the copyright symbol) replaced with (C) when using wprintf?
Asked Answered
G

1

6

When I try to print the copyright symbol © with printf or write, it works just fine:

#include <stdio.h>

int main(void)
{
    printf("©\n");
}

#include <unistd.h>

int main(void)
{
    write(1, "©\n", 3);
}

Output:

©

But when I try to print it with wprintf, I get (C):

#include <stdio.h>
#include <wchar.h>

int main(void)
{
    wprintf(L"©\n");
}

Output:

(C)

It's fixed when I add a call to setlocale, though:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
    setlocale(LC_ALL, "");
    wprintf(L"©\n");
}

Output:

©

Why is the original behavior present and why is it fixed when I call setlocale? Additionally, where does this conversion take place? And how can I make the behavior after setlocale the default?

compilation command:

gcc test.c

locale:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

echo $LC_CTYPE:


uname -a:

Linux penguin 4.19.79-07511-ge32b3719f26b #1 SMP PREEMPT Mon Nov 18 17:41:41 PST 2019 x86_64 GNU/Linux

file test.c (same on all of the examples):

test.c: C source, UTF-8 Unicode text

gcc --version:

gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/lib/x86_64-linux-gnu/libc-2.24.so (glibc version):

GNU C Library (Debian GLIBC 2.24-11+deb9u4) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.3.0 20170516.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.

cat /etc/debian_version:

9.12
Gunther answered 28/2, 2020 at 20:29 Comment(2)
One ideone.com/LMOH1T Two ideone.com/D6D17kInsessorial
It's translated by glibc autogenerated from glibc/C-translit.h.in.Stagehand
M
5

The locale of the calling processes is not automatically inherited by the new process.

When the program first starts up, it is in the C locale. The man page for setlocale(3) says the following:

On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:

setlocale(LC_ALL, "");

...

The locale "C" or "POSIX" is a portable locale; its LC_CTYPE part corresponds to the 7-bit ASCII character set.

So any multibyte / non-ASCII character is converted into one or more ASCII characters as the output shows.

The locale can be set as follows:

setlocale(LC_ALL, "");

The LC_ALL flag specifies changing all locale-related variables. An empty string for the locale means to set the locale according to the relevant environment variables. Once this is done, you should see the characters for your shell's locale.

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main()
{
    char *before = setlocale(LC_ALL, NULL);
    setlocale(LC_ALL, "");
    char *after = setlocale(LC_ALL, NULL);

    wprintf(L"before locale: %s\n", before);
    wprintf(L"after locale: %s\n", after);
    wprintf(L"©\n");
    wprintf(L"\u00A9\n");
    return 0;
}

Output:

before locale: C
after locale: en_US.utf8
©
©
Mammoth answered 28/2, 2020 at 20:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.