Some information:
- I've only tried this on Linux
- I've tried both with GCC (7.2.0) and Clang (3.8.1)
- It requires C++11 or higher to my understanding
What happens when I run it
I get the expected string "abcd" repeated until it hits the position of 4094 characters. After that all it outputs is this sign "?" until the end of the file.
What do I think about this?
I think this is not the expected behavior and that it must be a bug somewhere.
Code you can test with:
#include <iostream>
#include <fstream>
#include <locale>
#include <codecvt>
void createTestFile() {
std::ofstream file ("utf16le.txt", std::ofstream::binary);
if (file.is_open()) {
uint16_t bom = 0xFEFF; // UTF-16 little endian BOM
uint64_t abcd = 0x0064006300620061; // UTF-16 "abcd" string
file.write((char*)&bom,2);
for (size_t i=0; i<2000; i++) {
file.write((char*)&abcd,8);
}
file.close();
}
}
int main() {
//createTestFile(); // uncomment to make the test file
std::wifstream file;
std::wstring line;
file.open("utf16le.txt");
file.imbue(std::locale(file.getloc(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::consume_header>));
if (file.is_open()) {
while (getline(file,line)) {
std::wcout << line << std::endl;
}
}
}
line[4094]
, the endianness is rudely reversed. You should open a gcc bug, and attach the output of the debugger, showing what's inline[4093]
andline[4094]
– Peppery(gdb) p line[4093] $19 = (__gnu_cxx::__alloc_traits<std::allocator<wchar_t> >::value_type &) @0x628244: 98 L'b' (gdb) p line[4094] $20 = (__gnu_cxx::__alloc_traits<std::allocator<wchar_t> >::value_type &) @0x628248: 25344 L'挀'
- the C++ library obviously botched reading this string. – Pepperyod -v
. – Peppery<codecvt>
is deprecated. It was badly specified and the implementation never got around of implementing it semi-correctly. Don't use it. – Kuster