What's the difference between printf("%s"), printf("%ls"), wprintf("%s"), and wprintf("%ls")?
Asked Answered
D

3

31

Consider this sample program:

#include <cstdio>
#include <cwchar>
#include <string>

int main()
{
    std::string narrowstr = "narrow";
    std::wstring widestr = L"wide";
    printf("1 %s \n", narrowstr.c_str());
    printf("2 %ls \n", widestr.c_str());
    wprintf(L"3 %s \n", narrowstr.c_str());
    wprintf(L"4 %ls \n", widestr.c_str());
   
   return 0;
}

The output of this is:

1 narrow 
2 wide

I'm wondering:

  1. Why didn't 3 & 4 print?
  2. What's the differences between 1 & 3 and 2 & 4?
  3. Does it make any difference if narrowstr is in UTF8 and widestr is in UTF16?
Doily answered 8/11, 2014 at 11:21 Comment(3)
"widestr is in utf16" implies that you're using Windows (more Unicode-friendly systems use UTF-32 as the default for wide strings). There are many arcane hoops to jump if you want to do anything beyond ASCII on a WIndows system using standard C++ or C. You might as well give in and use WinAPI.Interstice
DO NOT GIVE IN TO MICROSOFT STUPIDITY. Save yourself the pain and write your own string library. For the love of god DO NOT use windows macro conversions and other insanity, trust me, it's horrendous, and in the confusion all kinds of bugs creep in.Fart
#4 likely didn't print because your program crashed on #3. %ls is the most portable way to print a wchar_t string and works from both printf and wprintf. You should avoid all use of %S because the Visual C++ interpretation of it is the exact opposite of the C99/C++11 standard.Hyperbolism
S
23

You need to do:

wprintf(L"3 %hs \n", narrowstr.c_str());
wprintf(L"4 %s \n", widestr.c_str());

Why? Because for printf, %s says narrow-char-string. For wprintf, %ls says wide.

But, for wprintf, %s implies wide, %ls would mean wide itself. %hs would mean narrow (for both). For printf, %s, in this manner would simply mean %hs

On VC++/Windows, %S (capital S), would reverse the effect. Therfore for printf("%S") it would mean wide, and wprintf("%S") would mean narrow. This is useful for _tprintf.

Sparry answered 8/11, 2014 at 12:50 Comment(1)
What does l and h stand for, low and high? Sounds not make sense.Textbook
I
11

Note that you're using C streams. C streams have a very special quality called "orientation". A stream is either unoriented, wide, or narrow. Orientation is decided by the first output made to any particular stream (see http://en.cppreference.com/w/cpp/io/c for a summary of C I/O streams)

In your case, stdout starts out unoriented, and by executing the first printf, you're setting it narrow. Once narrow, it's stuck narrow, and wprintf fails (check its return code!). The only way to change a C stream is to freopen it, which doesn't quite work with stdout. That's why 3 and 4 didn't print.

The differences between 1 and 3 is that 1 is a narrow output function which is using narrow string conversion specifier %s: it reads bytes from the char array and sends bytes into a byte stream. 3 is a wide output function with a narrow string conversion specifier %s: it first reads bytes from the char array and mbtowcs them into wchar_ts, then sends wchar_ts into a wide stream, which then wctombs them into bytes or multibyte sequences that are then pushed into the standard out with a write

Finally, if widestr is in utf16, you must be using Windows, and all bets are off; there is very little support for anything beyond ASCII on that platform. You may as well give in and use WinAPI (you can get by with standard C++11 for some Unicode things, and even do this C output, with magic words _setmode(_fileno(stdout), _O_U16TEXT);, that's been discussed enough times)

Interstice answered 13/11, 2014 at 3:26 Comment(0)
K
0

The answers to 1 and 2 are in the question are in the documentation. Any good set of documentation will do. They say cppreference is very good.

As for 3, the language standard does not specify any particular encoding for strings, or any particular size of wchar_t. You need to consult the documentation for your implementation, rather than for the language proper (though writing implementation-dependent code is rarely advisable).

Kolomna answered 8/11, 2014 at 11:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.