String to LPCWSTR in c++
Asked Answered
H

2

5

I'm trying to convert from string to LPCWSTR (I use multi-bite).

1) For example:

LPCWSTR ToLPCWSTR(string text)
{
    LPCWSTR sw = (LPCWSTR)text.c_str();
    return sw;
}

2) This returns Chinese characters:

LPCWSTR ToLPCWSTR(string text)
{
    std::wstring stemp = std::wstring(text.begin(), text.end());
    LPCWSTR sw = (LPCWSTR)stemp.c_str();
    return sw;
}

However, they both always shows squares:

Image

EDITED: My code with an edit by: Barmak Shemirani

std::wstring get_utf16(const std::string &str, int codepage)
{
    if (str.empty()) return std::wstring();
    int sz = MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), 0, 0);
    std::wstring res(sz, 0);
    MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), &res[0], sz);
    return res;
}

string HttpsWebRequest(string domain, string url)
{
    LPCWSTR sdomain = get_utf16(domain, CP_UTF8).c_str();
    LPCWSTR surl = get_utf16(url, CP_UTF8).c_str();
    //(Some stuff...)
}

Return: https://i.gyazo.com/ea4cd50765bfcbe12c763ea299e7b508.png

EDITED: Using another code that pass from UTF8 to UTF16, still the same result.

std::wstring utf8_to_utf16(const std::string& utf8)
{
    std::vector<unsigned long> unicode;
    size_t i = 0;
    while (i < utf8.size())
    {
        unsigned long uni;
        size_t todo;
        bool error = false;
        unsigned char ch = utf8[i++];
        if (ch <= 0x7F)
        {
            uni = ch;
            todo = 0;
        }
        else if (ch <= 0xBF)
        {
            throw std::logic_error("not a UTF-8 string");
        }
        else if (ch <= 0xDF)
        {
            uni = ch & 0x1F;
            todo = 1;
        }
        else if (ch <= 0xEF)
        {
            uni = ch & 0x0F;
            todo = 2;
        }
        else if (ch <= 0xF7)
        {
            uni = ch & 0x07;
            todo = 3;
        }
        else
        {
            throw std::logic_error("not a UTF-8 string");
        }
        for (size_t j = 0; j < todo; ++j)
        {
            if (i == utf8.size())
                throw std::logic_error("not a UTF-8 string");
            unsigned char ch = utf8[i++];
            if (ch < 0x80 || ch > 0xBF)
                throw std::logic_error("not a UTF-8 string");
            uni <<= 6;
            uni += ch & 0x3F;
        }
        if (uni >= 0xD800 && uni <= 0xDFFF)
            throw std::logic_error("not a UTF-8 string");
        if (uni > 0x10FFFF)
            throw std::logic_error("not a UTF-8 string");
        unicode.push_back(uni);
    }
    std::wstring utf16;
    for (size_t i = 0; i < unicode.size(); ++i)
    {
        unsigned long uni = unicode[i];
        if (uni <= 0xFFFF)
        {
            utf16 += (wchar_t)uni;
        }
        else
        {
            uni -= 0x10000;
            utf16 += (wchar_t)((uni >> 10) + 0xD800);
            utf16 += (wchar_t)((uni & 0x3FF) + 0xDC00);
        }
    }
    return utf16;
}
Hairsplitter answered 1/7, 2016 at 23:26 Comment(0)
R
2

If std::string source is English or some Latin languages then conversion to std::wstring can be done with simple copy (as shown in Miles Budnek's answer). But in general you have to use MultiByteToWideChar

std::wstring get_utf16(const std::string &str, int codepage)
{
    if (str.empty()) return std::wstring();
    int sz = MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), 0, 0);
    std::wstring res(sz, 0);
    MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), &res[0], sz);
    return res;
}

You have to know the codepage used to make the source string. You can use GetACP() to find the codepage for user computer. If source string is UTF8 then use CP_UTF8 for codepage.

Rexford answered 2/7, 2016 at 1:35 Comment(3)
It's UTF8, but still return chinese and squaresHairsplitter
U usually set string like this one: string url = u8"/post/show/933477/"; But always failHairsplitter
Fixed, Just needed to add the "c_str()" inside the request.Hairsplitter
C
5

You have two problems.

  1. LPCWSTR is a pointer to wchar_t, and std::string::c_str() returns a const char*. Those two types are different, so casting from const char* to LPCWSTR won't work.
  2. The memory pointed to by the pointer returned by std::basic_string::c_str is owned by the string object, and is freed when the string goes out of scope.

You will need to allocate memory and make a copy of the string.

The easiest way to allocate memory for a new wide string would be to just return a std::wstring. You can then pass the pointer returned by c_str() to whatever API function takes LPCWSTR:

std::wstring string_to_wstring(const std::string& text) {
    return std::wstring(text.begin(), text.end());
}
Camfort answered 1/7, 2016 at 23:42 Comment(2)
It's supposed to return a domain name with a directory, but again chinese and squares. I used your code and made the c_str() conversion. i.gyazo.com/ea4cd50765bfcbe12c763ea299e7b508.pngHairsplitter
@Miles Budnek: Please correct std::string& str ==> text so it matches with the return line (text.begin()...)Dangelo
R
2

If std::string source is English or some Latin languages then conversion to std::wstring can be done with simple copy (as shown in Miles Budnek's answer). But in general you have to use MultiByteToWideChar

std::wstring get_utf16(const std::string &str, int codepage)
{
    if (str.empty()) return std::wstring();
    int sz = MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), 0, 0);
    std::wstring res(sz, 0);
    MultiByteToWideChar(codepage, 0, &str[0], (int)str.size(), &res[0], sz);
    return res;
}

You have to know the codepage used to make the source string. You can use GetACP() to find the codepage for user computer. If source string is UTF8 then use CP_UTF8 for codepage.

Rexford answered 2/7, 2016 at 1:35 Comment(3)
It's UTF8, but still return chinese and squaresHairsplitter
U usually set string like this one: string url = u8"/post/show/933477/"; But always failHairsplitter
Fixed, Just needed to add the "c_str()" inside the request.Hairsplitter

© 2022 - 2024 — McMap. All rights reserved.