cout << setw doesn't align correctly with åäö
Asked Answered
P

5

13

The following code reproduces my problem:

#include <iostream>
#include <iomanip>
#include <string>

void p(std::string s, int w)
{
   std::cout << std::left << std::setw(w) << s;
}

int main(int argc, char const *argv[])
{
   p("COL_A", 7);
   p("COL_B", 7);
   p("COL_C", 5);
   std::cout << std::endl;
   p("ABC", 7);
   p("ÅÄÖ", 7);
   p("ABC", 5);
   std::cout << std::endl;
   return 0;
}

This produces the following output:

COL_A  COL_B  COL_C
ABC    ÅÄÖ ABC

If i change "ÅÄÖ" in the code to e.g. "ABC", then it works:

COL_A  COL_B  COL_C
ABC    ABC    ABC  

Why is this happening?

Pemmican answered 21/3, 2015 at 22:47 Comment(1)
Try printing the length of your strings.Calculous
K
7

Along with imbuing std::wcout with the proper locale, you probably have to switch to wide strings as well. For example:

void p(std::wstring s, int w)
{
   std::wcout << std::left << std::setw(w) << s;
}

int main(int argc, char const *argv[])
{
   setlocale(LC_ALL, "en_US.utf8");
   std::locale loc("en_US.UTF-8");
   std::wcout.imbue(loc);

   p(L"COL_A", 7);
   p(L"COL_B", 7);
   p(L"COL_C", 5);
   std::wcout << std::endl;
   p(L"ABC", 7);
   p(L"ÅÄÖ", 7);
   p(L"ABC", 5);
   std::wcout << std::endl;
   return 0;
}

Demo

Knowles answered 21/3, 2015 at 22:58 Comment(0)
P
6

It happens because those characters (Ä, Ö, ...) are unicode character that are likely encoded in UTF-8. This means that each character takes up several bytes (two in your case, up to four in the general case). setw OTOH does not know about UTF-8 - it just counts and thus aligns bytes.

Personnel answered 21/3, 2015 at 22:55 Comment(0)
W
4

C++20 std::format will correctly handle this

  std::cout << std::format("{:7}{:7}{:5}\n", "COL_A", "COL_B", "COL_C");
  std::cout << std::format("{:7}{:7}{:5}\n", "ABC", "ÅÄÖ", "ABC");

Output:

COL_A  COL_B  COL_C
ABC    ÅÄÖ    ABC  

In the meantime you can use the {fmt} library, std::format is based on. {fmt} also provides the print function that makes this even easier and more efficient (godbolt):

#include <fmt/core.h>

int main() {
  fmt::print("{:7}{:7}{:5}\n", "COL_A", "COL_B", "COL_C");
  fmt::print("{:7}{:7}{:5}\n", "ABC", "ÅÄÖ", "ABC");
}

Disclaimer: I'm the author of {fmt} and C++20 std::format.

Wira answered 3/3, 2021 at 4:26 Comment(0)
N
2

The problem is that your source code is certainly stored in UTF8, meaning 2 bytes for each letter of ÅÄÖ, and the locale for cout isn't set accordingly.

Hence your cout thinks it outputs 3x2=6 chars, and adds only one space to reach the 7 expected. Change the locale using imbue() to set it to UTF8.

Ninny answered 21/3, 2015 at 22:55 Comment(0)
D
0

This works for both accented latin letters and CJK characters:

#include <iomanip>
#include <iostream>
#include <string>
#include <wchar.h>

typedef decltype(std::setw(0)) setw_type;

setw_type
setww(int w, std::wstring s)
{
  auto delta = wcswidth(s.c_str(), s.length()) - s.length();
  return std::setw(w - delta);
}

void
print_wstring(std::wstring s, int w)
{
  std::wcout << setww(w, s) << s;
}

int
main(int argc, char * argv[])
{
  auto locale_string = "zh_CN.utf8";
  setlocale(LC_ALL, locale_string);
  std::locale loc(locale_string);
  std::wcout.imbue(loc);
  print_wstring(L"|一二三四", 9);
  print_wstring(L"|一二三四", 9);
  std::wcout << std::endl;
  print_wstring(L"公道", 9);
  print_wstring(L"自在人心", 9);
  std::wcout << std::endl;
}

Result:

g++ test01.cpp -o test01.exe  && ./test01.exe
|一二三四|一二三四
    公道 自在人心
Diagonal answered 15/11, 2018 at 17:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.