How to convert wstring into string?
Asked Answered
R

21

277

The question is how to convert wstring to string?

I have next example :

#include <string>
#include <iostream>

int main()
{
    std::wstring ws = L"Hello";
    std::string s( ws.begin(), ws.end() );

  //std::cout <<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;
    std::cout <<"std::string =     "<<s<<std::endl;
}

the output with commented out line is :

std::string =     Hello
std::wstring =    Hello
std::string =     Hello

but without is only :

std::wstring =    Hello

Is anything wrong in the example? Can I do the conversion like above?

EDIT

New example (taking into account some answers) is

#include <string>
#include <iostream>
#include <sstream>
#include <locale>

int main()
{
    setlocale(LC_CTYPE, "");

    const std::wstring ws = L"Hello";
    const std::string s( ws.begin(), ws.end() );

    std::cout<<"std::string =     "<<s<<std::endl;
    std::wcout<<"std::wstring =    "<<ws<<std::endl;

    std::stringstream ss;
    ss << ws.c_str();
    std::cout<<"std::stringstream =     "<<ss.str()<<std::endl;
}

The output is :

std::string =     Hello
std::wstring =    Hello
std::stringstream =     0x860283c

therefore the stringstream can not be used to convert wstring into string.

Redeemable answered 26/1, 2011 at 11:58 Comment(21)
I get two lines of output with the first cout line commented out. It seems that something is askew with your environment. What OS and compiler are you on?Guilt
How can you ask this question without specifying also the encodings?Crisscross
consider not using std::string at all. std::wstring has tons of advantages; is it really necessary to demote to std::string?Chromogen
@Marcelo fedora 9 (yeah, it is ancient, but I have no choice)Archivolt
@David I am a complete nowb regarding the locales. Mind adding an answer how it should be?Archivolt
@VJo It's a mess in C++ because there is no proper portable Unicode support. It's not even properly there in C++0x. But the main thing you need to get to grips with the encoding used by your strings. The wstring could be UTF-32 or UTF-16 maybe, or maybe UCS-2, I don't know. The string is most likely UTF-8 or one of the ISO 8 bit encodings. But only you can know the answers to these questions.Crisscross
@tenfour: Why use std::wstring at all? #1050447Botulinus
@Botulinus If you have data that is already encoded with UTF-16, whether or not UTF-16 is considered harmful is somewhat moot. And for what it's worth, I don't think any transformation form is harmful; what is harmful is people thinking they understand Unicode when in fact they don't.Crisscross
Does it have to be a cross-platform solution?Confident
@sad_man If you can make one that is better. If not, I would prefer a linux solution.Archivolt
Oops, I had one for Windows, not for Linux. Ok good luck.Confident
@dalle: what has wstring to do with UTF-16?Rahr
@Philipp: Absolutely nothing at all, although a lot of people incorrectly thinks that it has something to do with UTF-16. According to the C++ standard std::wstring cannot be UTF-16 encoded.Botulinus
@Botulinus c++ standard doesn't mention utf in any way (utf-8 or utf-16). Got a link where it says why utf-16 can't encoded with wstring?Archivolt
@VJo: C++ Standard 3.9.1 paragraph 5 states "Type wchar_t is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales."Botulinus
@Botulinus does that mean wchar_t cannot be used with variable length encodings?Crisscross
@David Heffernan: That is my interpretation. Although there are C++ implementations which does so anyway.Botulinus
@Botulinus So if you want to write portable C++ and use UTF-16 what do you do. As far as I can see C++ is still in the dark ages with regards Unicode and not even C++0x will bring complete support for the standard Unicode locales. It's basically utterly rubbish! The committee, fine outstanding individuals that they are, really should get on top of this issue.Crisscross
of course std::(w)string can contain UTF8 or UTF16, but other parts of the c++ standard library cant handle variable length encodings, most noticeable several locale facets. And of course if you do string manipulations with UTF8/16 strings (e.g. substr, resize, ...) you will have to check manually if all codepoints are still intact before outputting them..Hifi
@Chromogen opposite. utf8everywhere.orgGuffaw
https://github.com/Shilyx/charconv I think this lib is enough for win32 platformParvenu
R
38

Here is a worked-out solution based on the other suggestions:

#include <string>
#include <iostream>
#include <clocale>
#include <locale>
#include <vector>

int main() {
  std::setlocale(LC_ALL, "");
  const std::wstring ws = L"ħëłlö";
  const std::locale locale("");
  typedef std::codecvt<wchar_t, char, std::mbstate_t> converter_type;
  const converter_type& converter = std::use_facet<converter_type>(locale);
  std::vector<char> to(ws.length() * converter.max_length());
  std::mbstate_t state;
  const wchar_t* from_next;
  char* to_next;
  const converter_type::result result = converter.out(state, ws.data(), ws.data() + ws.length(), from_next, &to[0], &to[0] + to.size(), to_next);
  if (result == converter_type::ok or result == converter_type::noconv) {
    const std::string s(&to[0], to_next);
    std::cout <<"std::string =     "<<s<<std::endl;
  }
}

This will usually work for Linux, but will create problems on Windows.

Rahr answered 26/1, 2011 at 14:6 Comment(11)
@Phillip: which part of the code depend on the c-locale ? is the std::setlocale(LC_ALL, ""); really needed ?Hifi
@smerlin: I'm using vector now. (I was too lazy to look whether vector is guaranteed to be contiguous even in C++03, but it is.) setlocale is only needed if you are using wcout because that uses the stdio locale.Rahr
using std::wcout.imbue(locale) should do the job aswell, and it has the benefit that it does not change any global state.Hifi
The std::wstring_convert from C++11 wraps up a lot of this noise.Finnegan
@Philipp, what do you mean "will create problems on Windows"? What kind of problems?Varicose
The above code gives (as copied) gives me a *** glibc detected *** test: malloc(): smallbin double linked list corrupted: 0x000000000180ea30 *** on linux 64-bit (gcc 4.7.3). Anybody else experiencing this?Halation
The code above doesn't work on Linux (KUBUNTU) GCC 4.7Tenerife
@Halation maybe you can run it in valgrind and report a bug to the maintainers of the code part where the first violation happens ?Guffaw
I am getting this error on GCC 4.8: "Invalid arguments Candidates are: const #0 & use_facet(const std::locale &) ". Can I get some help please ?Kalikow
u got a typo "or" > if (result == converter_type::ok or result == converter_type::noconv)Annoyance
g++ 7.2.0 on msys2 (mingw64): This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. terminate called after throwing an instance of 'std::runtime_error' what(): locale::facet::_S_create_c_locale name not validLoreeloreen
T
390

As Cubbi pointed out in one of the comments, std::wstring_convert (C++11) provides a neat simple solution (you need to #include <locale> and <codecvt>):

std::wstring string_to_convert;

//setup converter
using convert_type = std::codecvt_utf8<wchar_t>;
std::wstring_convert<convert_type, wchar_t> converter;

//use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
std::string converted_str = converter.to_bytes( string_to_convert );

I was using a combination of wcstombs and tedious allocation/deallocation of memory before I came across this.

http://en.cppreference.com/w/cpp/locale/wstring_convert

update(2013.11.28)

One liners can be stated as so (Thank you Guss for your comment):

std::wstring str = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes("some string");

Wrapper functions can be stated as so: (Thank you ArmanSchwarz for your comment)

std::wstring s2ws(const std::string& str)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.from_bytes(str);
}

std::string ws2s(const std::wstring& wstr)
{
    using convert_typeX = std::codecvt_utf8<wchar_t>;
    std::wstring_convert<convert_typeX, wchar_t> converterX;

    return converterX.to_bytes(wstr);
}

Note: there's some controversy on whether string/wstring should be passed in to functions as references or as literals (due to C++11 and compiler updates). I'll leave the decision to the person implementing, but it's worth knowing.

Note: I'm using std::codecvt_utf8 in the above code, but if you're not using UTF-8 you'll need to change that to the appropriate encoding you're using:

http://en.cppreference.com/w/cpp/header/codecvt

Trite answered 22/8, 2013 at 7:57 Comment(16)
Please +1: this is the official C++ standard way to do string conversion. You can also use from_bytes to convert the other way. Because I personally like one-liners, here is my version: std::wstring str = std::wstring_convert<std::codecvt_utf<wchar_t>>().from_bytes("some string");Babblement
2 weeks I spent using giant dodgy templated monstrosities before I came across this. Thank you. Please consider wrapping in a simple std::string ws2s(std::wstring const&) function, might get more up-votes that way.Airing
Looks like en.cppreference.com/w/cpp/header/codecvt isn't available as of g++ 4.8.2. The two s2ws and ws2s methods do not currently work under linuxCountrybred
works like a charm, but only on modern compilers, due missing <codecvt> header for older versionsDessiedessma
This is the way to do it.Gynous
Be aware: In VS15 .from_bytes("äüöß") causes an unhandled c++ exception in xlocbuf, line 426, while .from_bytes(u8"äüöß") works. So please add to the answer to use unicode std::string (using u8)Toro
@RoiDanton: u8".." creates a narrow string (const char[]) that contains bytes that represent the text encoded using utf-8 encoding. ".." is also const char[] but its encoding is probably whatever character encoding you used for the source file (e.g., ANSI code page such as cp1252). You can't decode an arbitrary sequence of bytes using utf-8 encoding -- it may lead to the error that you've encountered (e.g., "äüöß".encode('cp1252').decode('utf-8') raises UnicodeDecodeError in Python).Honna
@J.F.Sebastian The source file is encoded in UTF8 (w/o bom). Furthermore with "\u00E4\u00F6\u00FC\u00DF\u20AC\u0040" the same crash occurs, while - if prepended with u8 - converting the sequence works.Toro
@RoiDanton: it is even worse then: you use the encoding that is incompatible with your compiler configuration otherwise the result would be the same as u8"" (but less portable). Note: you don't need the source code encoding to be utf-8 in order to use u8"" -- it is responsibility of the compiler to convert from the source code encoding to utf-8 i.e., the encoding used for the strings in the executable may be different from the source code encoding (u8"" = source_bytes.decode(source_code_encoding).encode('utf-8'); "" = source_bytes.decode(source_code_encoding).encode(exec_charset)).Honna
@J.F.Sebastian Thanks! Indeed, Visual C++ compiler uses the system's codepage as execution character set. I've changed its character set and it works: https://mcmap.net/q/56218/-set-execution-character-set-for-visual-c-compiler/4566599Toro
Unbelievable. I can't memorize this snippet and come back here on a regular basis. Darn string handling. Thanks for providing this snippet.Evermore
It looks like this is deprecated (https://mcmap.net/q/56219/-deprecated-header-lt-codecvt-gt-replacement). My compiler throws errors when I try to run this codeBenedictine
Deprecated on C++17... en.cppreference.com/w/cpp/locale/wstring_convert en.cppreference.com/w/cpp/locale/codecvt_utf8Evaginate
To anybody worrying about C++17 and further compatibility (due to deprecation) see: https://mcmap.net/q/16013/-c-convert-string-or-char-to-wstring-or-wchar_tGibeon
They're actually removing this method in C++26. 2 steps forward, 2 steps back?Outright
Apparently <filesystem> can be used for this type of string conversion too?Outright
M
177

An older solution from: http://forums.devshed.com/c-programming-42/wstring-to-string-444006.html

std::wstring wide( L"Wide" ); 
std::string str( wide.begin(), wide.end() );

// Will print no problemo!
std::cout << str << std::endl;

Update (2021): However, at least on more recent versions of MSVC, this may generate a wchar_t to char truncation warning. The warning can be quieted by using std::transform instead with explicit conversion in the transformation function, e.g.:

std::wstring wide( L"Wide" );

std::string str;
std::transform(wide.begin(), wide.end(), std::back_inserter(str), [] (wchar_t c) {
    return (char)c;
});

Or if you prefer to preallocate and not use back_inserter:

std::string str(wide.length(), 0);
std::transform(wide.begin(), wide.end(), str.begin(), [] (wchar_t c) {
    return (char)c;
});

See example on various compilers here.


Beware that there is no character set conversion going on here at all. What this does is simply to assign each iterated wchar_t to a char - a truncating conversion. It uses the std::string c'tor:

template< class InputIt >
basic_string( InputIt first, InputIt last,
              const Allocator& alloc = Allocator() );

As stated in comments:

values 0-127 are identical in virtually every encoding, so truncating values that are all less than 127 results in the same text. Put in a chinese character and you'll see the failure.

the values 128-255 of windows codepage 1252 (the Windows English default) and the values 128-255 of unicode are mostly the same, so if that's teh codepage you're using most of those characters should be truncated to the correct values. (I totally expected á and õ to work, I know our code at work relies on this for é, which I will soon fix)

And note that code points in the range 0x80 - 0x9F in Win1252 will not work. This includes , œ, ž, Ÿ, ...

Minor answered 23/8, 2012 at 18:18 Comment(16)
Bizarrely, this works on Visual Studio 10. What is going on? This should cause a truncating assigment from wchar_t to char for all elements of the original string.Congressional
@PedroLamarão: Why? since it's std::wstring which is template specializing of class 'string' for type 'wchar_t' in STL.Minor
The second line above creates an std::string (presumably) from a Range of Iterators, whose value type must then be char. But [wide.begin(), wide.end()) is a Range of Iterators whose value type is wchar_t, whose size is greater than the size of char. I see now that even your source states this is not portable. Perhaps it's Visual Studio specific.Congressional
Just tried it here codepad.org/zUh426eh and it worked. I believe they use some flavor of GCC. The said string constructor that takes ranges must have a std template specialization implemented for wstring to string and back as it is cross compatible.Minor
...when it goes to any non-latin characters.Lunalunacy
@PedroLamarão: values 0-127 are identical in virtually every encoding, so truncating values that are all less than 127 results in the same text. Put in a chinese character and you'll see the failure.Radiobiology
@MooingDuck I thought I'd seen this work for á or õ but I have tried again just now and it doesn't. Your reasoning must be correct.Congressional
@PedroLamarão: the values 128-255 of windows codepage 1252 (the Windows English default) and the values 128-255 of unicode are mostly the same, so if that's teh codepage you're using most of those characters should be truncated to the correct values. (I totally expected á and õ to work, I know our code at work relies on this for é, which I will soon fix)Radiobiology
No problems when using g++ 4.8.1 on Linux. Also works on VS2005.Kila
Didn't work on Solaris 10 either... error is "Could not find a match for std::string::basic_string(wchar_t*, wchar_t*)"Ario
This used to work for me until I upgraded to MSVC 2019 and v142 toolset - now it craps out with a warning (which I always treat as errors): warning C4244: 'argument': conversion from 'const wchar_t' to 'const _Elem', possible loss of dataCowart
@Cowart I noticed the same thing. Although the warning is unpleasant, that truncation is desired. If you're worried about the uncertainty, you could always extract this conversion to a method where you disable the warning. Of course you'd also want to do a check before converting to make sure the wstring values are within a valid 1 byte range for the returning string. If they are outside that range, throw an error or handle the case however you see fit.Amphibrach
Updated with a slightly more verbose but warning-free approach. Should work on Solaris too.Shurlock
The use of std::transform there with the lambda object incurs a 2x perf costOutright
The 2 MSVC++ warnings are due to the truncation & apparently a signed / unsigned mismatch doing the assignment of a wchar_t to a charOutright
But you can just suppress those warnings instead: #pragma warning( suppress : 4244 4365 ). You should put the #pragma just above the std::string s(wstring.begin(), wstring.end()) line of code (suppress only works for only 1 line)Outright
R
38

Here is a worked-out solution based on the other suggestions:

#include <string>
#include <iostream>
#include <clocale>
#include <locale>
#include <vector>

int main() {
  std::setlocale(LC_ALL, "");
  const std::wstring ws = L"ħëłlö";
  const std::locale locale("");
  typedef std::codecvt<wchar_t, char, std::mbstate_t> converter_type;
  const converter_type& converter = std::use_facet<converter_type>(locale);
  std::vector<char> to(ws.length() * converter.max_length());
  std::mbstate_t state;
  const wchar_t* from_next;
  char* to_next;
  const converter_type::result result = converter.out(state, ws.data(), ws.data() + ws.length(), from_next, &to[0], &to[0] + to.size(), to_next);
  if (result == converter_type::ok or result == converter_type::noconv) {
    const std::string s(&to[0], to_next);
    std::cout <<"std::string =     "<<s<<std::endl;
  }
}

This will usually work for Linux, but will create problems on Windows.

Rahr answered 26/1, 2011 at 14:6 Comment(11)
@Phillip: which part of the code depend on the c-locale ? is the std::setlocale(LC_ALL, ""); really needed ?Hifi
@smerlin: I'm using vector now. (I was too lazy to look whether vector is guaranteed to be contiguous even in C++03, but it is.) setlocale is only needed if you are using wcout because that uses the stdio locale.Rahr
using std::wcout.imbue(locale) should do the job aswell, and it has the benefit that it does not change any global state.Hifi
The std::wstring_convert from C++11 wraps up a lot of this noise.Finnegan
@Philipp, what do you mean "will create problems on Windows"? What kind of problems?Varicose
The above code gives (as copied) gives me a *** glibc detected *** test: malloc(): smallbin double linked list corrupted: 0x000000000180ea30 *** on linux 64-bit (gcc 4.7.3). Anybody else experiencing this?Halation
The code above doesn't work on Linux (KUBUNTU) GCC 4.7Tenerife
@Halation maybe you can run it in valgrind and report a bug to the maintainers of the code part where the first violation happens ?Guffaw
I am getting this error on GCC 4.8: "Invalid arguments Candidates are: const #0 & use_facet(const std::locale &) ". Can I get some help please ?Kalikow
u got a typo "or" > if (result == converter_type::ok or result == converter_type::noconv)Annoyance
g++ 7.2.0 on msys2 (mingw64): This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. terminate called after throwing an instance of 'std::runtime_error' what(): locale::facet::_S_create_c_locale name not validLoreeloreen
E
17

Default encoding on:

  • Windows UTF-16.
  • Linux UTF-8.
  • MacOS UTF-8.

My solution Steps, includes null chars \0 (avoid truncated). Without using functions on windows.h header:

  1. Add Macros to detect Platform. Windows/Linux and others
  1. Create function to convert std::wstring to std::string and inverse std::string to std::wstring
  1. Create function for print
  1. Print std::string/ std::wstring

Check RawString Literals. Raw String Suffix.

Linux Code. Print directly std::string using std::cout, Default Encoding on Linux is UTF-8, no need extra functions.

On Windows if you need to print unicode. We can use WriteConsole for print unicode chars from std::wstring.

Finally on Windows. You need a powerfull and complete view support for unicode chars in console. I recommend Windows Terminal

QA

  • Tested on Microsoft Visual Studio 2019 with VC++; std=c++17. (Windows Project)
  • Tested on repl.it using Clang compiler; std=c++17.

Q. Why you not use <codecvt> header functions and classes?.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.

Q. std ::wstring is cross platform?
A. No. std::wstring uses wchar_t elements. On Windows wchar_t size is 2 bytes, each character is stored in UTF-16 units, if character is bigger than U+FFFF, the character is represented in two UTF-16 units(2 wchar_t elements) called surrogate pairs. On Linux wchar_t size is 4 bytes each character is stored in one wchar_t element, no needed surrogate pairs. Check Standard data types on UNIX, Linux, and Windowsl.

Q. std ::string is cross platform?
A. Yes. std::string uses char elements. char type is guaranted that is same byte size in most compilers. char type size is 1 byte. Check Standard data types on UNIX, Linux, and Windowsl.

Full example code


#include <iostream>
#include <set>
#include <string>
#include <locale>

// WINDOWS
#if (_WIN32)
#include <Windows.h>
#include <conio.h>
#define WINDOWS_PLATFORM 1
#define DLLCALL STDCALL
#define DLLIMPORT _declspec(dllimport)
#define DLLEXPORT _declspec(dllexport)
#define DLLPRIVATE
#define NOMINMAX

//EMSCRIPTEN
#elif defined(__EMSCRIPTEN__)
#include <emscripten/emscripten.h>
#include <emscripten/bind.h>
#include <unistd.h>
#include <termios.h>
#define EMSCRIPTEN_PLATFORM 1
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

// LINUX - Ubuntu, Fedora, , Centos, Debian, RedHat
#elif (__LINUX__ || __gnu_linux__ || __linux__ || __linux || linux)
#define LINUX_PLATFORM 1
#include <unistd.h>
#include <termios.h>
#define DLLCALL CDECL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))
#define CoTaskMemAlloc(p) malloc(p)
#define CoTaskMemFree(p) free(p)

//ANDROID
#elif (__ANDROID__ || ANDROID)
#define ANDROID_PLATFORM 1
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

//MACOS
#elif defined(__APPLE__)
#include <unistd.h>
#include <termios.h>
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))
#include "TargetConditionals.h"
#if TARGET_OS_IPHONE && TARGET_IPHONE_SIMULATOR
#define IOS_SIMULATOR_PLATFORM 1
#elif TARGET_OS_IPHONE
#define IOS_PLATFORM 1
#elif TARGET_OS_MAC
#define MACOS_PLATFORM 1
#else

#endif

#endif



typedef std::string String;
typedef std::wstring WString;

#define EMPTY_STRING u8""s
#define EMPTY_WSTRING L""s

using namespace std::literals::string_literals;

class Strings
{
public:
    static String WideStringToString(const WString& wstr)
    {
        if (wstr.empty())
        {
            return String();
        }
        size_t pos;
        size_t begin = 0;
        String ret;

#if WINDOWS_PLATFORM
        int size;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
        while (pos != WString::npos && begin < wstr.length())
        {
            WString segment = WString(&wstr[begin], pos - begin);
            size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
            String converted = String(size, 0);
            WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
        }
        if (begin <= wstr.length())
        {
            WString segment = WString(&wstr[begin], wstr.length() - begin);
            size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
            String converted = String(size, 0);
            WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
            ret.append(converted);
        }
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        size_t size;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
        while (pos != WString::npos && begin < wstr.length())
        {
            WString segment = WString(&wstr[begin], pos - begin);
            size = wcstombs(nullptr, segment.c_str(), 0);
            String converted = String(size, 0);
            wcstombs(&converted[0], segment.c_str(), converted.size());
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
        }
        if (begin <= wstr.length())
        {
            WString segment = WString(&wstr[begin], wstr.length() - begin);
            size = wcstombs(nullptr, segment.c_str(), 0);
            String converted = String(size, 0);
            wcstombs(&converted[0], segment.c_str(), converted.size());
            ret.append(converted);
        }
#else
        static_assert(false, "Unknown Platform");
#endif
        return ret;
    }

    static WString StringToWideString(const String& str)
    {
        if (str.empty())
        {
            return WString();
        }

        size_t pos;
        size_t begin = 0;
        WString ret;
#ifdef WINDOWS_PLATFORM
        int size = 0;
        pos = str.find(static_cast<char>(0), begin);
        while (pos != std::string::npos) {
            std::string segment = std::string(&str[begin], pos - begin);
            std::wstring converted = std::wstring(segment.size() + 1, 0);
            size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.length());
            converted.resize(size);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = str.find(static_cast<char>(0), begin);
        }
        if (begin < str.length()) {
            std::string segment = std::string(&str[begin], str.length() - begin);
            std::wstring converted = std::wstring(segment.size() + 1, 0);
            size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, segment.c_str(), segment.size(), &converted[0], converted.length());
            converted.resize(size);
            ret.append(converted);
        }

#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        size_t size;
        pos = str.find(static_cast<char>(0), begin);
        while (pos != String::npos)
        {
            String segment = String(&str[begin], pos - begin);
            WString converted = WString(segment.size(), 0);
            size = mbstowcs(&converted[0], &segment[0], converted.size());
            converted.resize(size);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = str.find(static_cast<char>(0), begin);
        }
        if (begin < str.length())
        {
            String segment = String(&str[begin], str.length() - begin);
            WString converted = WString(segment.size(), 0);
            size = mbstowcs(&converted[0], &segment[0], converted.size());
            converted.resize(size);
            ret.append(converted);
        }
#else
        static_assert(false, "Unknown Platform");
#endif
        return ret;
    }
};

enum class ConsoleTextStyle
{
    DEFAULT = 0,
    BOLD = 1,
    FAINT = 2,
    ITALIC = 3,
    UNDERLINE = 4,
    SLOW_BLINK = 5,
    RAPID_BLINK = 6,
    REVERSE = 7,
};

enum class ConsoleForeground
{
    DEFAULT = 39,
    BLACK = 30,
    DARK_RED = 31,
    DARK_GREEN = 32,
    DARK_YELLOW = 33,
    DARK_BLUE = 34,
    DARK_MAGENTA = 35,
    DARK_CYAN = 36,
    GRAY = 37,
    DARK_GRAY = 90,
    RED = 91,
    GREEN = 92,
    YELLOW = 93,
    BLUE = 94,
    MAGENTA = 95,
    CYAN = 96,
    WHITE = 97
};

enum class ConsoleBackground
{
    DEFAULT = 49,
    BLACK = 40,
    DARK_RED = 41,
    DARK_GREEN = 42,
    DARK_YELLOW = 43,
    DARK_BLUE = 44,
    DARK_MAGENTA = 45,
    DARK_CYAN = 46,
    GRAY = 47,
    DARK_GRAY = 100,
    RED = 101,
    GREEN = 102,
    YELLOW = 103,
    BLUE = 104,
    MAGENTA = 105,
    CYAN = 106,
    WHITE = 107
};

class Console
{
private:
    static void EnableVirtualTermimalProcessing()
    {
#if defined WINDOWS_PLATFORM
        HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
        DWORD dwMode = 0;
        GetConsoleMode(hOut, &dwMode);
        if (!(dwMode & ENABLE_VIRTUAL_TERMINAL_PROCESSING))
        {
            dwMode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
            SetConsoleMode(hOut, dwMode);
        }
#endif
    }

    static void ResetTerminalFormat()
    {
        std::cout << u8"\033[0m";
    }

    static void SetVirtualTerminalFormat(ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
    {
        String format = u8"\033[";
        format.append(std::to_string(static_cast<int>(foreground)));
        format.append(u8";");
        format.append(std::to_string(static_cast<int>(background)));
        if (styles.size() > 0)
        {
            for (auto it = styles.begin(); it != styles.end(); ++it)
            {
                format.append(u8";");
                format.append(std::to_string(static_cast<int>(*it)));
            }
        }
        format.append(u8"m");
        std::cout << format;
    }
public:
    static void Clear()
    {

#ifdef WINDOWS_PLATFORM
        std::system(u8"cls");
#elif LINUX_PLATFORM || defined MACOS_PLATFORM
        std::system(u8"clear");
#elif EMSCRIPTEN_PLATFORM
        emscripten::val::global()["console"].call<void>(u8"clear");
#else
        static_assert(false, "Unknown Platform");
#endif
    }

    static void Write(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
#ifndef EMSCRIPTEN_PLATFORM
        EnableVirtualTermimalProcessing();
        SetVirtualTerminalFormat(foreground, background, styles);
#endif
        String str = s;
#ifdef WINDOWS_PLATFORM
        WString unicode = Strings::StringToWideString(str);
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUX_PLATFORM || defined MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        std::cout << str;
#else
        static_assert(false, "Unknown Platform");
#endif

#ifndef EMSCRIPTEN_PLATFORM
        ResetTerminalFormat();
#endif
    }

    static void WriteLine(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        Write(s, foreground, background, styles);
        std::cout << std::endl;
    }

    static void Write(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
#ifndef EMSCRIPTEN_PLATFORM
        EnableVirtualTermimalProcessing();
        SetVirtualTerminalFormat(foreground, background, styles);
#endif
        WString str = s;

#ifdef WINDOWS_PLATFORM
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), str.c_str(), static_cast<DWORD>(str.length()), nullptr, nullptr);
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        std::cout << Strings::WideStringToString(str);
#else
        static_assert(false, "Unknown Platform");
#endif

#ifndef EMSCRIPTEN_PLATFORM
        ResetTerminalFormat();
#endif
    }

    static void WriteLine(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        Write(s, foreground, background, styles);
        std::cout << std::endl;
    }

    static void WriteLine()
    {
        std::cout << std::endl;
    }

    static void Pause()
    {
        char c;
        do
        {
            c = getchar();
            std::cout << "Press Key " << std::endl;
        } while (c != 64);
        std::cout << "KeyPressed" << std::endl;
    }

    static int PauseAny(bool printWhenPressed = false, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        int ch;
#ifdef WINDOWS_PLATFORM
        ch = _getch();
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        struct termios oldt, newt;
        tcgetattr(STDIN_FILENO, &oldt);
        newt = oldt;
        newt.c_lflag &= ~(ICANON | ECHO);
        tcsetattr(STDIN_FILENO, TCSANOW, &newt);
        ch = getchar();
        tcsetattr(STDIN_FILENO, TCSANOW, &oldt);
#else
        static_assert(false, "Unknown Platform");
#endif
        if (printWhenPressed)
        {
            Console::Write(String(1, ch), foreground, background, styles);
        }
        return ch;
    }
};



int main()
{
    std::locale::global(std::locale(u8"en_US.UTF8"));
    auto str = u8"🐶\0Hello\0🐶123456789也不是可运行的程序123456789日本"s;//
    WString wstr = L"🐶\0Hello\0🐶123456789也不是可运行的程序123456789日本"s;
    WString wstrResult = Strings::StringToWideString(str);
    String strResult = Strings::WideStringToString(wstr);
    bool equals1 = wstr == wstrResult;
    bool equals2 = str == strResult;

    Console::WriteLine(u8"█ Converted Strings printed with Console::WriteLine"s, ConsoleForeground::GREEN);
    Console::WriteLine(wstrResult, ConsoleForeground::BLUE);//Printed OK on Windows/Linux.
    Console::WriteLine(strResult, ConsoleForeground::BLUE);//Printed OK on Windows/Linux.
    
    Console::WriteLine(u8"█ Converted Strings printed with std::cout/std::wcout"s, ConsoleForeground::GREEN);
    std::cout << strResult << std::endl;//Printed OK on Linux. BAD on Windows.
    std::wcout << wstrResult << std::endl; //Printed BAD on Windows/Linux.
    Console::WriteLine();
    Console::WriteLine(u8"Press any key to exit"s, ConsoleForeground::DARK_GRAY);
    Console::PauseAny();

}

You cant test this code on https://repl.it/@JomaCorpFX/StringToWideStringToString#main.cpp


**Screenshots**

Using Windows Terminal WindowsTerminal

Using cmd/powershell enter image description here

Repl.it capture
enter image description here

Evaginate answered 24/9, 2018 at 23:13 Comment(0)
B
12

Instead of including locale and all that fancy stuff, if you know for FACT your string is convertible just do this:

#include <iostream>
#include <string>

using namespace std;

int main()
{
  wstring w(L"bla");
  string result;
  for(char x : w)
    result += x;

  cout << result << '\n';
}

Live example here

Burdened answered 21/7, 2012 at 2:35 Comment(5)
+1 because it's a simple solution that works for some scenarios (for a loose definition of "works", I might add).Hygienic
Almost the same thing as namar0x0309's solution, which is much more elegant IMHO. But that's just me.Vanhoose
I spiffied up your code to actually work with minimal modification ;-)Handspring
-1 If you have a wstring, it's likely you're dealing with multibyte characters. If you could know the string is trivially convertible, you wouldn't be handling a wstring in the first place. More likely, you're dealing with another library that expects you to handle the wstring properly. Truncating the wchars is just begging for a hard to trace bug later on. Also, you should use "string result( w.begin(), w.end() );" if you were going to do it, to avoid a loop that could trigger many reallocations.Antons
@Kian, Plain and simple, and perfect for some limited use cases.Toronto
R
8

I believe the official way is still to go thorugh codecvt facets (you need some sort of locale-aware translation), as in

resultCode = use_facet<codecvt<char, wchar_t, ConversionState> >(locale).
  in(stateVar, scratchbuffer, scratchbufferEnd, from, to, toLimit, curPtr);

or something like that, I don't have working code lying around. But I'm not sure how many people these days use that machinery and how many simply ask for pointers to memory and let ICU or some other library handle the gory details.

Refined answered 26/1, 2011 at 12:11 Comment(0)
B
8

There are two issues with the code:

  1. The conversion in const std::string s( ws.begin(), ws.end() ); is not required to correctly map the wide characters to their narrow counterpart. Most likely, each wide character will just be typecast to char.
    The resolution to this problem is already given in the answer by kem and involves the narrow function of the locale's ctype facet.

  2. You are writing output to both std::cout and std::wcout in the same program. Both cout and wcout are associated with the same stream (stdout) and the results of using the same stream both as a byte-oriented stream (as cout does) and a wide-oriented stream (as wcout does) are not defined.
    The best option is to avoid mixing narrow and wide output to the same (underlying) stream. For stdout/cout/wcout, you can try switching the orientation of stdout when switching between wide and narrow output (or vice versa):

    #include <iostream>
    #include <stdio.h>
    #include <wchar.h>
    
    int main() {
        std::cout << "narrow" << std::endl;
        fwide(stdout, 1); // switch to wide
        std::wcout << L"wide" << std::endl;
        fwide(stdout, -1); // switch to narrow
        std::cout << "narrow" << std::endl;
        fwide(stdout, 1); // switch to wide
        std::wcout << L"wide" << std::endl;
    }
    
Biostatics answered 26/1, 2011 at 13:33 Comment(1)
Yes, that fixes the problem with using cout and wcout.Archivolt
H
8

Besides just converting the types, you should also be conscious about the string's actual format.

When compiling for Multi-byte Character set Visual Studio and the Win API assumes UTF8 (Actually windows encoding which is Windows-28591 ).
When compiling for Unicode Character set Visual studio and the Win API assumes UTF16.

So, you must convert the string from UTF16 to UTF8 format as well, and not just convert to std::string.
This will become necessary when working with multi-character formats like some non-latin languages.

The idea is to decide that std::wstring always represents UTF16.
And std::string always represents UTF8.

This isn't enforced by the compiler, it's more of a good policy to have. Note the string prefixes I use to define UTF16 (L) and UTF8 (u8).

To convert between the 2 types, you should use: std::codecvt_utf8_utf16< wchar_t>

#include <string>

#include <codecvt>

int main()
{

    std::string original8 = u8"הלו";

    std::wstring original16 = L"הלו";

    //C++11 format converter
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> convert;

    //convert to UTF8 and std::string
    std::string utf8NativeString = convert.to_bytes(original16);

    std::wstring utf16NativeString = convert.from_bytes(original8);

    assert(utf8NativeString == original8);
    assert(utf16NativeString == original16);

    return 0;
}
Harewood answered 2/1, 2020 at 8:55 Comment(0)
R
7

At the time of writing this answer, the number one google search for "convert string wstring" would land you on this page. My answer shows how to convert string to wstring, although this is NOT the actual question, and I should probably delete this answer but that is considered bad form. You may want to jump to this StackOverflow answer, which is now higher ranked than this page.


Here's a way to combining string, wstring and mixed string constants to wstring. Use the wstringstream class.

#include <sstream>

std::string narrow = "narrow";
std::wstring wide = "wide";

std::wstringstream cls;
cls << " abc " << narrow.c_str() << L" def " << wide.c_str();
std::wstring total= cls.str();
Roath answered 19/7, 2011 at 22:48 Comment(3)
This is not a wstring to string conversionPiero
@Michael Can you please explain? What about this is incorrect? Your comment is not helpful without more details.Swarts
this is a string to wstring conversion. i.e. the opposite of the question.Hemorrhage
T
7

If you are dealing with file paths (as I often am when I find the need for wstring-to-string) you can use filesystem::path (since C++17):

#include <filesystem>

const std::wstring wPath = GetPath(); // some function that returns wstring
const std::string path = std::filesystem::path(wPath).string();
Terrain answered 24/5, 2021 at 13:26 Comment(1)
This is exactly what I was looking for. Thanks.Ranking
I
6

You might as well just use the ctype facet's narrow method directly:

#include <clocale>
#include <locale>
#include <string>
#include <vector>

inline std::string narrow(std::wstring const& text)
{
    std::locale const loc("");
    wchar_t const* from = text.c_str();
    std::size_t const len = text.size();
    std::vector<char> buffer(len + 1);
    std::use_facet<std::ctype<wchar_t> >(loc).narrow(from, from + len, '_', &buffer[0]);
    return std::string(&buffer[0], &buffer[len]);
}
Infanta answered 7/11, 2011 at 21:58 Comment(0)
E
6

This solution is inspired in dk123's solution, but uses a locale dependent codecvt facet. The result is in locale encoded string instead of UTF-8 (if it is not set as locale):

std::string w2s(const std::wstring &var)
{
   static std::locale loc("");
   auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
   return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).to_bytes(var);
}

std::wstring s2w(const std::string &var)
{
   static std::locale loc("");
   auto &facet = std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(loc);
   return std::wstring_convert<std::remove_reference<decltype(facet)>::type, wchar_t>(&facet).from_bytes(var);
}

I was searching for it, but I can't find it. Finally I found that I can get the right facet from std::locale using the std::use_facet() function with the right typename. Hope this helps.

Eel answered 27/4, 2016 at 14:57 Comment(2)
Vizor, what are the advantages (if any) of converting with the locale dependent facet?Goree
If you work with strings from system, from console input for example.Eel
K
6

I spent many sad days trying to come up with a way to do this for C++17, which deprecated code_cvt facets, and this is the best I was able to come up with by combining code from a few different sources:

setlocale( LC_ALL, "en_US.UTF-8" ); //Invoked in main()

std::string wideToMultiByte( std::wstring const & wideString )
{
     std::string ret;
     std::string buff( MB_CUR_MAX, '\0' );

     for ( wchar_t const & wc : wideString )
     {
         int mbCharLen = std::wctomb( &buff[ 0 ], wc );

         if ( mbCharLen < 1 ) { break; }

         for ( int i = 0; i < mbCharLen; ++i ) 
         { 
             ret += buff[ i ]; 
         }
     }

     return ret;
 }

 std::wstring multiByteToWide( std::string const & multiByteString )
 {
     std::wstring ws( multiByteString.size(), L' ' );
     ws.resize( 
         std::mbstowcs( &ws[ 0 ], 
             multiByteString.c_str(), 
             multiByteString.size() ) );

     return ws;
 }

I tested this code on Windows 10, and at least for my purposes, it seems to work fine. Please don't lynch me if this doesn't consider some crazy edge cases that you might need to handle, I'm sure someone with more experience can improve on this! :-)

Also, credit where it's due:

Adapted for wideToMultiByte()

Copied for multiByteToWide

Karyolysis answered 16/11, 2020 at 9:35 Comment(0)
B
3

In my case, I have to use multibyte character (MBCS), and I want to use std::string and std::wstring. And can't use c++11. So I use mbstowcs and wcstombs.

I make same function with using new, delete [], but it is slower then this.

This can help How to: Convert Between Various String Types

EDIT

However, in case of converting to wstring and source string is no alphabet and multi byte string, it's not working. So I change wcstombs to WideCharToMultiByte.

#include <string>

std::wstring get_wstr_from_sz(const char* psz)
{
    //I think it's enough to my case
    wchar_t buf[0x400];
    wchar_t *pbuf = buf;
    size_t len = strlen(psz) + 1;

    if (len >= sizeof(buf) / sizeof(wchar_t))
    {
        pbuf = L"error";
    }
    else
    {
        size_t converted;
        mbstowcs_s(&converted, buf, psz, _TRUNCATE);
    }

    return std::wstring(pbuf);
}

std::string get_string_from_wsz(const wchar_t* pwsz)
{
    char buf[0x400];
    char *pbuf = buf;
    size_t len = wcslen(pwsz)*2 + 1;

    if (len >= sizeof(buf))
    {
        pbuf = "error";
    }
    else
    {
        size_t converted;
        wcstombs_s(&converted, buf, pwsz, _TRUNCATE);
    }

    return std::string(pbuf);
}

EDIT to use 'MultiByteToWideChar' instead of 'wcstombs'

#include <Windows.h>
#include <boost/shared_ptr.hpp>
#include "string_util.h"

std::wstring get_wstring_from_sz(const char* psz)
{
    int res;
    wchar_t buf[0x400];
    wchar_t *pbuf = buf;
    boost::shared_ptr<wchar_t[]> shared_pbuf;

    res = MultiByteToWideChar(CP_ACP, 0, psz, -1, buf, sizeof(buf)/sizeof(wchar_t));

    if (0 == res && GetLastError() == ERROR_INSUFFICIENT_BUFFER)
    {
        res = MultiByteToWideChar(CP_ACP, 0, psz, -1, NULL, 0);

        shared_pbuf = boost::shared_ptr<wchar_t[]>(new wchar_t[res]);

        pbuf = shared_pbuf.get();

        res = MultiByteToWideChar(CP_ACP, 0, psz, -1, pbuf, res);
    }
    else if (0 == res)
    {
        pbuf = L"error";
    }

    return std::wstring(pbuf);
}

std::string get_string_from_wcs(const wchar_t* pcs)
{
    int res;
    char buf[0x400];
    char* pbuf = buf;
    boost::shared_ptr<char[]> shared_pbuf;

    res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, buf, sizeof(buf), NULL, NULL);

    if (0 == res && GetLastError() == ERROR_INSUFFICIENT_BUFFER)
    {
        res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, NULL, 0, NULL, NULL);

        shared_pbuf = boost::shared_ptr<char[]>(new char[res]);

        pbuf = shared_pbuf.get();

        res = WideCharToMultiByte(CP_ACP, 0, pcs, -1, pbuf, res, NULL, NULL);
    }
    else if (0 == res)
    {
        pbuf = "error";
    }

    return std::string(pbuf);
}
Bocage answered 31/3, 2014 at 15:54 Comment(2)
How can I use "wcstombs_s" with gcc 4.8 ? Because I see that is C++ 11 feature.Kalikow
@Kalikow You can use the "unsafe" version of this function wcstombs().Eel
W
2
#include <boost/locale.hpp>
namespace lcv = boost::locale::conv;

inline std::wstring fromUTF8(const std::string& s)
{ return lcv::utf_to_utf<wchar_t>(s); }

inline std::string toUTF8(const std::wstring& ws)
{ return lcv::utf_to_utf<char>(ws); }
Walburga answered 30/11, 2016 at 16:1 Comment(1)
Very easy solution as always from BOOST.... however it cannot handle special symbols.Bamako
U
1

In case anyone else is interested: I needed a class that could be used interchangeably wherever either a string or wstring was expected. The following class convertible_string, based on dk123's solution, can be initialized with either a string, char const*, wstring or wchar_t const* and can be assigned to by or implicitly converted to either a string or wstring (so can be passed into a functions that take either).

class convertible_string
{
public:
    // default ctor
    convertible_string()
    {}

    /* conversion ctors */
    convertible_string(std::string const& value) : value_(value)
    {}
    convertible_string(char const* val_array) : value_(val_array)
    {}
    convertible_string(std::wstring const& wvalue) : value_(ws2s(wvalue))
    {}
    convertible_string(wchar_t const* wval_array) : value_(ws2s(std::wstring(wval_array)))
    {}

    /* assignment operators */
    convertible_string& operator=(std::string const& value)
    {
        value_ = value;
        return *this;
    }
    convertible_string& operator=(std::wstring const& wvalue)
    {
        value_ = ws2s(wvalue);
        return *this;
    }

    /* implicit conversion operators */
    operator std::string() const { return value_; }
    operator std::wstring() const { return s2ws(value_); }
private:
    std::string value_;
};
Unclear answered 31/8, 2015 at 17:53 Comment(1)
I would rather store an std::wstring in the class, than to store std::string and do a conversion to std::wstring when needed to get an std::wstring. Because std::wstring is somewhat faster than std::string and it's better compatible. Even it consumes more memory than std::string.Posse
U
0
std::string
convert_str(const std::wstring &s) {
  //not determinate how many bytes to place all wide characters.
  std::string res(2 * s.size(), '\0');
  while(1) {
    size_t num_used_char = std::wcstombs(res.data(), s.data(), res.size());
    if (num_used_char == (size_t) -1) {
      //do your error handling.
    } 
    else if (num_used_char < res.size()) {
      res.resize(num_used_char);
      break;
    }
    res.resize(2 * res.size());
  }
  return res;
}
Unmanly answered 30/5, 2023 at 15:3 Comment(0)
O
0

Although sus, using std::string s( wideString.begin(), wideString.end() ) does work, but completely truncates the wide characters with a straight cast.

That method generates 2 C++ warnings, one of which has a very large spew in MSVC++:

warning C4244: '=': conversion from 'wchar_t' to 'char', possible loss of data

  • 1>(compiling source file 'Filename.cpp') 1>C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xutility(4537,18): 1>the template instantiation context (the oldest one first) is.....

warning C4365: '=': conversion from 'wchar_t' to 'char', signed/unsigned mismatch

To suppress that, you don't have to rewrite it using std::transform or anything, you can just use #pragma warning( suppress : 4244 4365 ), which suppresses those warnings only for the line that immediately follows it

#include <string>
using namespace std;

int main() {

  wstring wide = L"A Хороший string";
  #pragma warning( suppress : 4244 4365 )
  string conv( wide.begin(), wide.end() );

  printf( "wide.size() = %zu, conv.size() = %zu\n", wide.size(), conv.size() );
  if( wide.size() == conv.size() ) puts( "Same size" ); // This is what happens
  else puts( "Different sizes!" );

  // They are the same size. So let's compare.
  for( int i = 0 ; i < wide.size() ; i++ ) {
    printf( "WIDE: [ %04x/%c ] CHAR: [ %04x/%c ]\n", wide[i], wide[i], conv[i], conv[i] );
  }

}

Output:

chr truncation for wide string conversion

I should note here that this is not the correct way to convert wide strings. It loses information and that's what those C++ warnings are about. If you want to retain the information of the wide characters, while having any ANSI/English characters represented by 1 byte only, then you can use the UTF8 encoding. On Windows, there's a really simple pair of functions that lets you do that transformation described here

Sample code using that function:

string utf8 = utf8_encode( wide );  // getUtf8?
printf( "The UTF8 string size = %zu\n", utf8.size() );
for( int i = 0; i < utf8.size(); i++ ) {
  printf( "UTF8: [ %04x/%c ]\n", utf8[i], utf8[i] );
}

Outputs like:

utf8 string conversion

On Linux/using <codecvt>, these functions would be:

#include <codecvt>
#include <locale>

// suppress 'codecvt_utf8<wchar_t>' is deprecated warnings
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wdeprecated-declarations"
std::string getUtf8( const std::wstring &wstr ) {
  std::wstring_convert< std::codecvt_utf8<wchar_t>, wchar_t > convert;
  return convert.to_bytes( wstr );
}

std::wstring fromUtf8( const std::string &str ) {
  std::wstring_convert< std::codecvt_utf8<wchar_t>, wchar_t > convert;
  return convert.from_bytes( str );
}
// turn warnings back on
#pragma clang diagnostic pop
Outright answered 15/4 at 22:26 Comment(0)
N
-1

I am using below to convert wstring to string.

std::string strTo;
char *szTo = new char[someParam.length() + 1];
szTo[someParam.size()] = '\0';
WideCharToMultiByte(CP_ACP, 0, someParam.c_str(), -1, szTo, (int)someParam.length(), NULL, NULL);
strTo = szTo;
delete szTo;
Nebo answered 3/1, 2018 at 17:3 Comment(1)
You seem to be missing a standard header (<string>) and a definition for WideCharToMultiByte() - is that some wrapper around std::wctomb()?Kraul
V
-1

Source: https://msdn.microsoft.com/en-us/library/87zae4a3.aspx

The conversion of char strings to wchar_t strings and vice versa is a typical problem on Windows. I can't think of a use case for this in Linux. The type wchar_t has a length of 2 bytes on Windows and a length of 4 bytes on Linux. Since C++20 there are the types char8_t, char16_t and char32_t with the corresponding number of bits. In new projects you should therefore use char8_t for UTF-8, char16_t for UTF-16 and char32_t for UTF-32, see https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=msvc-170.

On Windows, almost all classic API functions use the type wchar_t, which corresponds on Windows to the modern type char16_t. The conversion from char to wchar_t and vice versa is therefore always necessary if the value of a char string is to be inserted into an API function or, conversely, if the wchar_t string created by an API function is to be converted into a char string.

As the conversion between the two string types is a typical Windows problem, Windows functions should also be used for this. The Windows SDK offers the WideCharToMultiByte() function for converting a wchar_t into a char string using a specific code page. Windows also provides the MultiByteToWideChar() function for reverse conversion. If you specify CP_UTF8 as the code page, these functions convert between the Unicode formats UTF-16 and UTF-8. Both functions are very unwieldy.

ATL therefore provides two template classes that wrap these functions to simplify the conversions. You only need the headers <atlconv.h> and <atlstr.h>, no library needs to be loaded. CW2A is a typedef for the class template CW2AEX, which wraps the WideCharToMultiByte() function. Similarly, CA2W is a typedef for the class template CA2WEX, which wraps the function MultiByteToWideChar(). The instances of these classes have the attribute m_psz, which is of type char* or wchar_t.

In the following example, I start with a UTF-8 character string of type const char*, which contains Chinese characters and a smiley. The char string is converted to a wchar_t string with CA2W so that I can use the Windows function MessageBoxW(). The wchar_t string is then converted back to a char string using CW2A. Make sure that you specify CP_UTF8 as the second parameter of the constructor for both classes, otherwise ATL will use the current ANSI code page. The last statement confirms that the new and the original string have the same content.

#include <iostream>
#include <string> // the C++ Standard String Class
#include <atlconv.h>
#include <atlstr.h>

int main()
{
  const char* utf8Str = (const char*)u8"要开心 😊"; // 'Be happy 😊'
  CA2W atow(utf8Str, CP_UTF8);
  MessageBoxW(nullptr, atow.m_psz, L"Title", MB_OK);

  std::wstring utf16Str = atow.m_psz;
  CW2A wtoa(utf16Str.c_str(), CP_UTF8);
  std::string utf8Str2 = wtoa.m_psz;

  std::wcout << "utf8Str == utf8Str2: " << (utf8Str == utf8Str2) << std::endl;
}
Varney answered 20/12, 2023 at 11:11 Comment(0)
R
-3
// Embarcadero C++ Builder 

// convertion string to wstring
string str1 = "hello";
String str2 = str1;         // typedef UnicodeString String;   -> str2 contains now u"hello";

// convertion wstring to string
String str2 = u"hello";
string str1 = UTF8string(str2).c_str();   // -> str1 contains now "hello"
Ricer answered 11/11, 2014 at 11:27 Comment(2)
please explain what ure doing there in your answer, else it may get deletetTakamatsu
Where does UTF8string function come from ?Welbie

© 2022 - 2024 — McMap. All rights reserved.