In C++, I want to use Unicode to do things. So after falling down the rabbit hole of Unicode, I've managed to end up in a train wreck of confusion, headaches and locales.
But in Boost I've had the unfortunate problem of trying to use Unicode file paths and trying to use the Boost program options library with Unicode input. I've read whatever I could find on the subjects of locales, codecvts, Unicode encodings and Boost.
My current attempt to get things to work is to have a codecvt that takes a UTF-8 string and converts it to the platform's encoding (UTF-8 on POSIX, UTF-16 on Windows), I've been trying to avoid wchar_t
.
The closest I've actually gotten is trying to do this with Boost.Locale, to convert from a UTF-8 string to a UTF-32 string on output.
#include <string>
#include <boost/locale.hpp>
#include <locale>
int main(void)
{
std::string data("Testing, 㤹");
std::locale fromLoc = boost::locale::generator().generate("en_US.UTF-8");
std::locale toLoc = boost::locale::generator().generate("en_US.UTF-32");
typedef std::codecvt<wchar_t, char, mbstate_t> cvtType;
cvtType const* toCvt = &std::use_facet<cvtType>(toLoc);
std::locale convLoc = std::locale(fromLoc, toCvt);
std::cout.imbue(convLoc);
std::cout << data << std::endl;
// Output is unconverted -- what?
return 0;
}
I think I had some other kind of conversion working using wide characters, but I really don't know what I'm even doing. I don't know what the right tool for the job is at this point. Help?