UTF-8 text in MFC application that uses Multibyte character set
Asked Answered
I

2

7

I am working on an application which receives text encoded in UTF-8 and needs to display it on some MFC control. The application is build using MultiByte character set (MBCS) and let's assume this cannot change.

I was hoping that if I convert the text I receive from UTF-8 to wide char string, I would be able to display it correctly using the SetWindowTextW method. To try this, I used a toy app which reads the input from a file and sets the texts of my controls.

std::wstring utf8_decode(const std::string &str)
{
    if (str.empty()) return std::wstring();
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo(size_needed, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

BOOL CAboutDlg::OnInitDialog()
{
    std::vector<std::string> texts;
    texts.resize(6);
    std::fstream f("D:\\code\\sample-utf8.txt", std::ios::in);
    for (size_t i=0;i<6;++i)
        std::getline(f, texts[i]);

    ::SetWindowTextW(GetDlgItem(IDC_BUTTON1)->m_hWnd, utf8_decode(texts[0]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON2)->m_hWnd, utf8_decode(texts[1]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON3)->m_hWnd, utf8_decode(texts[2]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON4)->m_hWnd, utf8_decode(texts[3]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON5)->m_hWnd, utf8_decode(texts[4]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON6)->m_hWnd, utf8_decode(texts[5]).c_str());
    return TRUE;
}

Having built the toy-app with MBCS, I am not getting what I would like. enter image description here

As soon as I build the app using unicode, it all works fine enter image description here

Does this mean there is no hope of using unicode text for individual controls when I build with MBCS? If it is possible, can you give me any pointers? Thank you.

Incidental answered 1/2, 2019 at 17:57 Comment(5)
Since the MFC windows are MBCS, use SetWindowTextA() instead, and convert the UTF-8 data to UTF-16 as before but then convert the UTF-16 to MBCS using WideCharToMultiByte(CP_ACP) before passing it to the windows. But you will likely end up with the same result, since converting Unicode to MBCS is lossy. This is why you shouldn't be using MBCS anymoreStiffnecked
You may need to set a font to the window that is unicode--can handle unicode characters. If you just have one dialog, you could recreate the child windows in code to delete the old MBCS window and use a unicode window for those child windows.Abukir
Your code should work fine. Maybe your text file is not UTF8. Use notepad the check the encoding. Or it's a font issue as noted by @JosephWillcoxson - Set font to Segoe UI for Windows Vista and higher.Hydrogenize
He said it works if he makes the application Unicode. That would tell me that the text file is probably ok. If the font in the controls can't handle unicode, then it could have that problem he is seeing.Abukir
Thanks for the tips guys. @JosephWillcoxson I will try your suggestion to create them programmatically, it seems like the best bet.Incidental
R
4

An MBCS app creates MBCS windows, which, generally speaking, are only going to be able to display text from a single code page, even if you use the wide string interface.

For an MBCS app, the wide string version of SetWindowTextW is essentially converting the wide string to MBCS, using the user's current locale (which has a default code page), and then passing that to the -A version of the function.

As you've seen with your "Unicode" experiement, you're doing the right thing in general, but you're limited by the fact that the app is MBCS.

Richardricharda answered 1/2, 2019 at 21:43 Comment(2)
I tried this with VS 2008, I set MBCS, and enabled Visual Style, and ::SetWindowTextW shows Unicode text properly. If I disable Visual Style (as the asker has done) then I get ?? in place of 😀. I never did figure out MBCS, I guess I never will.Hydrogenize
I see, things make a lot more sense now, thanks for the explanation!Incidental
I
3

I got this working, thanks to the suggestions of Adrian McCarthy and Joseph Willcoxson. I create the controls using the CreateWindowExW method, then setting the text using the SetWindowTextW. Below is sample code in case it 's of any help:

std::wstring utf8_decode(const std::string &str)
{
    if (str.empty()) return std::wstring();
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo(size_needed, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}

HWND CreateButtonW(int x, int y, int width, int height, HWND parent)
{
    HWND hwndButton = ::CreateWindowExW(
        WS_EX_CLIENTEDGE,
        L"BUTTON",  // Predefined class; Unicode assumed 
        L"", WS_TABSTOP | WS_VISIBLE | WS_CHILD | BS_DEFPUSHBUTTON,
        x, y, width, height, parent,
        NULL, // No menu.
        (HINSTANCE)GetWindowLong(parent, GWL_HINSTANCE),
        NULL);      // Pointer not needed.
    return hwndButton;
}

BOOL CAboutDlg::OnInitDialog()
{
    std::vector<std::string> texts;
    texts.resize(6);
    std::fstream f("D:\\code\\sample-utf8.txt", std::ios::in);
    for (size_t i=0;i<6;++i)
        std::getline(f, texts[i]);


    ::SetWindowTextW(GetDlgItem(IDC_BUTTON1)->m_hWnd, utf8_decode(texts[0]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON2)->m_hWnd, utf8_decode(texts[1]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON3)->m_hWnd, utf8_decode(texts[2]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON4)->m_hWnd, utf8_decode(texts[3]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON5)->m_hWnd, utf8_decode(texts[4]).c_str());
    ::SetWindowTextW(GetDlgItem(IDC_BUTTON6)->m_hWnd, utf8_decode(texts[5]).c_str());

    auto width  = [](RECT& r) { return r.right - r.left; };
    auto height = [](RECT& r) { return r.bottom - r.right; };

    RECT r;
    GetDlgItem(IDC_BUTTON1)->GetWindowRect(&r); ScreenToClient(&r);
    HWND hBtnWnd = CreateButtonW(r.right+20, r.top, width(r), height(r), m_hWnd);
    ::SetWindowTextW(hBtnWnd, utf8_decode(texts[0]).c_str());

    GetDlgItem(IDC_BUTTON2)->GetWindowRect(&r); ScreenToClient(&r);
    hBtnWnd = CreateButtonW(r.right+20, r.top, width(r), height(r), m_hWnd);
    ::SetWindowTextW(hBtnWnd, utf8_decode(texts[1]).c_str());

    GetDlgItem(IDC_BUTTON3)->GetWindowRect(&r); ScreenToClient(&r);
    hBtnWnd = CreateButtonW(r.right+20, r.top, width(r), height(r), m_hWnd);
    ::SetWindowTextW(hBtnWnd, utf8_decode(texts[2]).c_str());

    GetDlgItem(IDC_BUTTON4)->GetWindowRect(&r); ScreenToClient(&r);
    hBtnWnd = CreateButtonW(r.right+20, r.top, width(r), height(r), m_hWnd);
    ::SetWindowTextW(hBtnWnd, utf8_decode(texts[3]).c_str());

    GetDlgItem(IDC_BUTTON5)->GetWindowRect(&r); ScreenToClient(&r);
    hBtnWnd = CreateButtonW(r.right+20, r.top, width(r), height(r), m_hWnd);
    ::SetWindowTextW(hBtnWnd, utf8_decode(texts[4]).c_str());

    GetDlgItem(IDC_BUTTON6)->GetWindowRect(&r); ScreenToClient(&r);
    hBtnWnd = CreateButtonW(r.right+20, r.top, width(r), height(r), m_hWnd);
    ::SetWindowTextW(hBtnWnd, utf8_decode(texts[5]).c_str());

    return TRUE;
}

And the result - on the left the buttons as created by default, on the right when created using CreateWindowExW:

enter image description here

Incidental answered 4/2, 2019 at 18:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.