Seems that std::codecvt_utf8
works well for conversion std::wstring
-> utf8
. It passed all my tests. (Windows app, Visual Studio 2015, Windows 8 with EN locale)
I needed a way to convert filenames to UTF8. Therefore my test is about filenames.
In my app I use boost::filesystem::path
1.60.0 to deal with file path. It works well, but not able to convert filenames to UTF8 properly.
Internally Windows version of boost::filesystem::path
uses std::wstring
to store the file path. Unfortunately, build-in conversion to std::string
works bad.
Test case:
- create file with mixed symbols
c:\test\皀皁皂皃的
(some random Asian symbols)
- scan dir with
boost::filesystem::directory_iterator
, get boost::filesystem::path
for the file
- convert it to the
std::string
via build-in conversion filenamePath.string()
- you get
c:\test\?????
. Asian symbols converted to '?'. Not good.
boost::filesystem
uses std::codecvt
internally. It doesn't work for conversion std::wstring
-> std::string
.
Instead of build-in boost::filesystem::path
conversion you can define conversion function as this (original snippet):
std::string utf8_to_wstring(const std::wstring & str)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
return myconv.to_bytes(str);
}
Then you can convert filepath to UTF8 easily: utf8_to_wstring(filenamePath.wstring())
. It works perfectly.
It works for any filepath. I tested ASCII strings c:\test\test_file
, Asian strings c:\test\皀皁皂皃的
, Russian strings c:\test\абвгд
, mixed strings c:\test\test_皀皁皂皃的
, c:\test\test_абвгд
, c:\test\test_皀皁皂皃的_абвгд
. For every string I receive valid UTF8 representation.