How to convert ANSI byte to Unicode string?
Asked Answered
U

3

7

I have an vector<BYTE> that represents characters in a string. I want to interpret those characters as ASCII characters and store them in a Unicode (UTF-16) string. The current code assumes that the characters in the vector<BYTE> are Unicode rather than ASCII. This works fine for standard ASCII, but fails for extended ASCII characters. These characters need to be interpreted using the current code page retrieved via GetACP(). How would I go about creating a Unicode (UTF-16) string with these ASCII characters?

EDIT: I believe the solution should have something to do with the macros discussed here: http://msdn.microsoft.com/en-us/library/87zae4a3(v=vs.80).aspx I'm just not exactly sure how the actual implementation would go.

int ExtractByteArray(CATLString* pszResult, const CByteVector* pabData)
{
    // place the data into the output cstring
    pszResult->Empty();
    for(int iIndex = 0; iIndex < pabData->GetSize(); iIndex++)
        *pszResult += (TCHAR)pabData->GetAt(iIndex);

    return RC_SUCCESS;
}
Ululant answered 20/2, 2013 at 15:52 Comment(8)
If you are using MFC, can't you get CString to do it automatically?Kimberli
"I have an vector<BYTE> that represents characters in a string." - why not std::string ?Bathesda
There is no such thing as "extended ASCII". There are quite a few different 8 bit single byte encodings which are identical with ASCII for their first 128 code points, but they're not ASCII (and there are quite a few of them).Chronopher
Here are the CString constructors: msdn.microsoft.com/en-US/library/cws1zdt8(v=vs.110).aspx Just use the one that receives a pointer to char and a length, and your job is done.Kimberli
@Bathesda The data is returned from a device in this generic way. It is then interpreted based on a set of given criteria. The data could be an integer, bitmap, string, etc.Ululant
@bgh10788: Then I don't see any sense in converting binary data of bitmap into any kind of string (UTF-16 or not).Bathesda
@DavidHeffernan This doesn't seem to work. I get the dreaded box/question mark when the byte value is > 128. Plus, CByteVector is implemented in such a way that I cannot get a pointer to the first object, just the value.Ululant
Well, I don't think that CString is broken. But I'm confused. In the question you said you had vector<BYTE>. I've no enthusiasm to help if we can't even work out what the question is.Kimberli
D
5

You should use MultibyteToWideChar to convert that string to unicode

Deaton answered 20/2, 2013 at 15:56 Comment(4)
Well you could do this but it's easier to use the appropriate CString constructor and let it convert.Kimberli
@DavidHeffernan Of course it is better but it only work for MFC and ATL applications(I guess it solve the problem for the bgh10788) but this function work for any program and framework on WindowsDeaton
But the question tags MFC and names CATLString, so such a solution would seem appropriate.Kimberli
ASCII characters are 1 byte, the unicode is 2 bytes. Therefore multibytetowidechar is not the right function - ASCII is not multibyte to begin with!Blamed
B
1

I have a vector<BYTE> that represents characters in a string. I want to interpret those characters as ASCII characters and store them in a Unicode (UTF-16) string

You should use std::vector<BYTE> only when you are working with binary data. While working with strings use std::string instead. Note that this std::string object will contain special characters that will be encoded by sequences of one or more bytes (thus called multi-byte characters), but these are not ASCII characters.

Once you use std::string, you can use MultiByteToWideChar to create own function that will convert a std::string (which contains multi-byte UTF-8 characters) into std::wstring containing UTF-16 encoded points:

// multi byte to wide char:
std::wstring s2ws(const std::string& str)
{
    int size_needed = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), NULL, 0);
    std::wstring wstrTo(size_needed, 0);
    MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &wstrTo[0], size_needed);
    return wstrTo;
}
Bathesda answered 20/2, 2013 at 16:16 Comment(2)
But I am working with binary data. The point of this function is to take this binary data and convert it to a CATLString.Ululant
@bgh10788: Why would you convert binary data into string? That makes no sense. If it's binary data, then treat it as binary data. If it's a string then treat it as one.Bathesda
B
1

Since you're using MFC, let CString do the job.

Bronk answered 20/2, 2013 at 16:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.