How to printf accented characters in ANSI C (like á é í ó ú)
Asked Answered
A

3

12

I tried to printf with some accented characters such as á é í ó ú:

printf("my name is Seán\n");

The text editor in the DEVC++ IDE displays them fine - i.e the source code looks fine. I guess I need some library other than stdio.h and maybe some variant of the normal printf.

I'm using IDE Bloodshed DEVC running on Windows XP.

Ablaut answered 29/9, 2012 at 11:29 Comment(0)
D
6

Perhaps the best is to use Unicode.

Here's how...

First, manually set your console font to "Consolas" or "Lucida Console" or whichever True-Type Unicode font you can choose ("Raster fonts" may not work, those aren't Unicode fonts, although they may include characters you're interested in).

Next, set the console code page to 65001 (UTF-8) with SetConsoleOutputCP(CP_UTF8).

Then convert your text to UTF-8 (if it's not yet in UTF-8) using WideCharToMultiByte(CP_UTF8, ...).

Finally, call WriteConsoleA() to output the UTF-8 text.

Here's a little function that does all these things for you, it's an "improved" variant of wprintf():

int _wprintf(const wchar_t* format, ...)
{
  int r;
  static int utf8ModeSet = 0;
  static wchar_t* bufWchar = NULL;
  static size_t bufWcharCount = 256;
  static char* bufMchar = NULL;
  static size_t bufMcharCount = 256;
  va_list vl;
  int mcharCount = 0;

  if (utf8ModeSet == 0)
  {
    if (!SetConsoleOutputCP(CP_UTF8))
    {
      DWORD err = GetLastError();
      fprintf(stderr, "SetConsoleOutputCP(CP_UTF8) failed with error 0x%X\n", err);
      utf8ModeSet = -1;
    }
    else
    {
      utf8ModeSet = 1;
    }
  }

  if (utf8ModeSet != 1)
  {
    va_start(vl, format);
    r = vwprintf(format, vl);
    va_end(vl);
    return r;
  }

  if (bufWchar == NULL)
  {
    if ((bufWchar = malloc(bufWcharCount * sizeof(wchar_t))) == NULL)
    {
      return -1;
    }
  }

  for (;;)
  {
    va_start(vl, format);
    r = vswprintf(bufWchar, bufWcharCount, format, vl);
    va_end(vl);

    if (r < 0)
    {
      break;
    }

    if (r + 2 <= bufWcharCount)
    {
      break;
    }

    free(bufWchar);
    if ((bufWchar = malloc(bufWcharCount * sizeof(wchar_t) * 2)) == NULL)
    {
      return -1;
    }
    bufWcharCount *= 2;
  }

  if (r > 0)
  {
    if (bufMchar == NULL)
    {
      if ((bufMchar = malloc(bufMcharCount)) == NULL)
      {
        return -1;
      }
    }

    for (;;)
    {
      mcharCount = WideCharToMultiByte(CP_UTF8,
                                       0,
                                       bufWchar,
                                       -1,
                                       bufMchar,
                                       bufMcharCount,
                                       NULL,
                                       NULL);
      if (mcharCount > 0)
      {
        break;
      }

      if (GetLastError() != ERROR_INSUFFICIENT_BUFFER)
      {
        return -1;
      }

      free(bufMchar);
      if ((bufMchar = malloc(bufMcharCount * 2)) == NULL)
      {
        return -1;
      }
      bufMcharCount *= 2;
    }
  }

  if (mcharCount > 1)
  {
    DWORD numberOfCharsWritten, consoleMode;

    if (GetConsoleMode(GetStdHandle(STD_OUTPUT_HANDLE), &consoleMode))
    {
      fflush(stdout);
      if (!WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE),
                         bufMchar,
                         mcharCount - 1,
                         &numberOfCharsWritten,
                         NULL))
      {
        return -1;
      }
    }
    else
    {
      if (fputs(bufMchar, stdout) == EOF)
      {
        return -1;
      }
    }
  }

  return r;
}

Following tests this function:

_wprintf(L"\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7"
         L"\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF"
         L"\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7"
         L"\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF"
         L"\n"
         L"\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7"
         L"\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF"
         L"\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7"
         L"\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF"
         L"\n"
         L"\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7"
         L"\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF"
         L"\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7"
         L"\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF"
         L"\n");

_wprintf(L"\x391\x392\x393\x394\x395\x396\x397"
         L"\x398\x399\x39A\x39B\x39C\x39D\x39E\x39F"
         L"\x3A0\x3A1\x3A2\x3A3\x3A4\x3A5\x3A6\x3A7"
         L"\x3A8\x3A9\x3AA\x3AB\x3AC\x3AD\x3AE\x3AF\x3B0"
         L"\n"
         L"\x3B1\x3B2\x3B3\x3B4\x3B5\x3B6\x3B7"
         L"\x3B8\x3B9\x3BA\x3BB\x3BC\x3BD\x3BE\x3BF"
         L"\x3C0\x3C1\x3C2\x3C3\x3C4\x3C5\x3C6\x3C7"
         L"\x3C8\x3C9\x3CA\x3CB\x3CC\x3CD\x3CE"
         L"\n");

_wprintf(L"\x410\x411\x412\x413\x414\x415\x401\x416\x417"
         L"\x418\x419\x41A\x41B\x41C\x41D\x41E\x41F"
         L"\x420\x421\x422\x423\x424\x425\x426\x427"
         L"\x428\x429\x42A\x42B\x42C\x42D\x42E\x42F"
         L"\n"
         L"\x430\x431\x432\x433\x434\x435\x451\x436\x437"
         L"\x438\x439\x43A\x43B\x43C\x43D\x43E\x43F"
         L"\x440\x441\x442\x443\x444\x445\x446\x447"
         L"\x448\x449\x44A\x44B\x44C\x44D\x44E\x44F"
         L"\n");

And should result in the following text in the console:

 ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡ΢ΣΤΥΦΧΨΩΪΫάέήίΰ
αβγδεζηθικλμνξοπρςστυφχψωϊϋόύώ
АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
абвгдеёжзийклмнопрстуфхцчшщъыьэюя

I do not know the encoding in which your IDE stores non-ASCII characters in .c/.cpp files and I do not know what your compiler does when encounters non-ASCII characters. This part you should figure out yourself.

As long as you supply to _wprintf() properly encoded UTF-16 text or call WriteConsoleA() with properly encoded UTF-8 text, things should work.

P.S. Some gory details about console fonts can be found here.

Dorinda answered 29/9, 2012 at 13:8 Comment(6)
The problem with this solution is the "manually set your console font to [...] whichever True-Type Unicode font you can choose". Explaining that to an end-user makes it a difficult solution to deploy.Losel
@Losel And I thought it was a Windows problem.Dorinda
200 lines to print 'é' ? I guess it's possible to do simpler.Chyack
@King'sjester Feel free to share with us your findings.Dorinda
To cite somebody on cplusplus.com : "Too many things need to go right to do it properly, and not all of them can be controlled from your program. It's so unreliable that it's practically never worth the effort.Trust me, I speak from experience. Just forget it."Chyack
@King'sjester OTOH, if we aren't targeting the moving target, if we're dealing with perfect Windows (which both XP and 7 are since they aren't developed anymore), stable hacks can be employed. I don't know if Windows 10 still has this problem and needs to achieve perfection first. :)Dorinda
D
4

Windows console is generally considered badly broken regarding to character encodings. You can read about this problem here, for example.

The problem is that Windows generally uses the ANSI codepage (which is assuming you are in Western Europe or America Windows-1252), but the console uses the OEM codepage (CP850 under the same assumption).

You have several options:

  • Convert the text to CP850 before writing it (see CharToOem()). The drawback is that if the user redirects the output to a file (> file.txt) and opens the file with e.g. Notepad, he will see it wrong.
  • Change the codepage of the console: You need to select a TTF console font (Lucida Console, for example) and use the command chcp 1252.
  • Use UNICODE text and wprintf(): You need the TTF console font anyway.
Dawes answered 29/9, 2012 at 11:38 Comment(0)
L
0

The Windows-1252 (also known as "ANSI") character set used by Windows console mode is not the same as that used by GUI applications. Hence the IDE representation differs from the runtime representation.

A quick-and-dirty solution for your example is:

printf("my name is Se\xe9n\n");

Most solutions to this problem are flawed one way or another and the simplest solution for Windows applications that need extensive multi-language localisation is to write them as GUI apps using Unicode.

Losel answered 29/9, 2012 at 13:28 Comment(1)
Thanks to all who gave me answers --- Rodrego, Alexey and Clifford. I really appreciate the advice. It will take me a little while to try out the suggestions, and get head around the topic. Then I will return to this Question in the forum to tell waht I used -- or ask more help if necessary. SeánAblaut

© 2022 - 2024 — McMap. All rights reserved.