I'm primarily interested in the Unix-like systems (e.g., portable POSIX) as it seems like Windows does strange things for wide characters.
Do the read and write wide character functions (like getwchar()
and putwchar()
) always "do the right thing", for example read from utf-8 and write to utf-8 when that is the set locale, or do I have to manually call wcrtomb()
and print the string using e.g. fputs()
? On my system (openSUSE 12.3) where $LANG
is set to en_GB.UTF-8
they do seem to do the right thing (inspecting the output I see what looks like UTF-8 even though strings were stored using wchar_t and written using the wide character functions).
However I am unsure if this is guaranteed. For example cprogramming.com states that:
[wide characters] should not be used for output, since spurious zero bytes and other low-ASCII characters with common meanings (such as '/' and '\n') will likely be sprinkled throughout the data.
Which seems to indicate that outputting wide characters (presumably using the wide character output functions) can wreak havoc.
Since the C standard does not seem to mention coding at all I really have no idea who/when/how coding is applied when using wchar_t. So my question is basically if reading, writing and using wide characters exclusively is a proper thing to do when my application has no need to know about the encoding used. I only need string lengths and console widths (wcswidth()
), so to me using wchar_t everywhere when dealing with text seems ideal.