_bstr_t to UTF-8 possible?
Asked Answered
A

3

9

I have a _bstr_t string which contains Japanese text. I want to convert this string to a UTF-8 string which is defined as a char *.

Can I convert the _bstr_t string to char * (UTF-8) string without losing the Japanese characters?

Amenable answered 9/3, 2009 at 12:40 Comment(0)
H
15

Use WideCharToMultiByte() – pass CP_UTF8 as the first parameter.

Beware that BSTR can be a null pointer and that corresponds to an empty string – treat this as a special case.

Haldi answered 9/3, 2009 at 12:43 Comment(0)
H
1

Here is some code that should do the conversion.

void PrintUtf8(const TCHAR* value) { 
    if (value == nullptr) {
        printf("");
        return;
    }
    int n = WideCharToMultiByte(CP_UTF8, 0, value, -1, nullptr, 0, nullptr, nullptr);
    if (n <= 0) {
        printf("");
        return;
    }
    char* buffer = new char[n];
    WideCharToMultiByte(CP_UTF8, 0, value, -1, buffer, n, nullptr, nullptr);
    printf("%s", buffer);
    delete(buffer);
}
Hunk answered 10/1, 2017 at 19:32 Comment(0)
K
-1

Very handy MSDN reference for this sort of thing: http://msdn.microsoft.com/en-us/library/ms235631(VS.80).aspx

I think you need to go to wchar_t* since char* will lose the Unicode stuff, although I'm not sure.

// convert_from_bstr_t.cpp
// compile with: /clr /link comsuppw.lib

#include <iostream>
#include <stdlib.h>
#include <string>

#include "atlbase.h"
#include "atlstr.h"
#include "comutil.h"

using namespace std;
using namespace System;

int main()
{
    _bstr_t orig("Hello, World!");
    wcout << orig << " (_bstr_t)" << endl;

    // Convert to a char*
    const size_t newsize = 100;
    char nstring[newsize];
    strcpy_s(nstring, (char *)orig);
    strcat_s(nstring, " (char *)");
    cout << nstring << endl;

    // Convert to a wchar_t*
    wchar_t wcstring[newsize];
    wcscpy_s(wcstring, (wchar_t *)orig);
    wcscat_s(wcstring, L" (wchar_t *)");
    wcout << wcstring << endl;

    // Convert to a CComBSTR
    CComBSTR ccombstr((char *)orig);
    if (ccombstr.Append(L" (CComBSTR)") == S_OK)
    {
        CW2A printstr(ccombstr);
        cout << printstr << endl;
    }

    // Convert to a CString
    CString cstring((char *)orig);
    cstring += " (CString)";
    cout << cstring << endl;

    // Convert to a basic_string
    string basicstring((char *)orig);
    basicstring += " (basic_string)";
    cout << basicstring << endl;

    // Convert to a System::String
    String ^systemstring = gcnew String((char *)orig);
    systemstring += " (System::String)";
    Console::WriteLine("{0}", systemstring);
    delete systemstring;
}
Kiger answered 9/3, 2009 at 12:44 Comment(4)
Thanks for your reply Nick. The problem is that I want to send this _bstr_t content via the Windows socket which allows only char* type to be sent (please check WSABUF structure in ws2def.h file). Now a wchat wont do. Is there a wide char version of _WSABUF structure?Amenable
Windows Sockets don't care what data you send. In this case you can just reinterpret_cast to char* and be fine.Haldi
Just don't mess up with the number of bytes - it's number of Unicode characters times sizeof(WCHAR) - and with null BSTRs.Haldi
Although Windows Sockets don't care what data is sent, if the destination needs to understand the data and is using different byte-ordering, it is better to use UTF-8. Especially in mixed environment where systems with both byte-orderings are used.Yahairayahata

© 2022 - 2024 — McMap. All rights reserved.