Is there any problem?
There are several issues:
CString
is a template specialization of CStringT. Depending on the BaseType describing the character type, there are two concrete specializations: CStringA
(using char
) and CStringW
(using wchar_t
).
- While
wchar_t
on Windows is ubiquitously used to store UTF-16 encoded code units, using char
is ambiguous. The latter commonly stores ANSI encoded characters, but can also store ASCII, UTF-8, or even binary data.
- We don't know the character encoding (or even character type) of
CString
(which is controlled through the _UNICODE
preprocessor symbol), making the question ambiguous. We also don't know the desired character encoding of std::string
.
- Converting between Unicode and ANSI is inherently lossy: ANSI encoding can only represent a subset of the Unicode character set.
To address these issues, I'm going to assume that wchar_t
will store UTF-16 encoded code units, and char
will hold UTF-8 octet sequences. That's the only reasonable choice you can make to ensure that source and destination strings retain the same information, without limiting the solution to a subset of the source or destination domains.
The following implementations convert between CStringA
/CStringW
and std::wstring
/std::string
mapping from UTF-8 to UTF-16 and vice versa:
#include <string>
#include <atlconv.h>
std::string to_utf8(CStringW const& src_utf16)
{
return { CW2A(src_utf16.GetString(), CP_UTF8).m_psz };
}
std::wstring to_utf16(CStringA const& src_utf8)
{
return { CA2W(src_utf8.GetString(), CP_UTF8).m_psz };
}
The remaining two functions construct C++ string objects from MFC strings, leaving the encoding unchanged. Note that while the previous functions cannot cope with embedded NUL characters, these functions are immune to that.
#include <string>
#include <atlconv.h>
std::string to_std_string(CStringA const& src)
{
return { src.GetString(), src.GetString() + src.GetLength() };
}
std::wstring to_std_wstring(CStringW const& src)
{
return { src.GetString(), src.GetString() + src.GetLength() };
}