SUMMARY
How can I write a zip file using libarchive in C++, such that path names will be UTF-8 encoded? With UTF-8 path names, special characters will be decoded correctly when using OS X / Linux / Windows 8 / 7-Zip / WinZip.
DETAILS
I am trying to write a zip archive using libarchive, compiling with Visual C++ 2013 on Windows.
I would like to be able to add files with non-ASCII chars (e.g. äöü.txt) to the zip archive.
There are four functions to set the pathname header in libarchive:
void archive_entry_set_pathname(struct archive_entry *, const char *);
void archive_entry_copy_pathname(struct archive_entry *, const char *);
void archive_entry_copy_pathname_w(struct archive_entry *, const wchar_t *);
int archive_entry_update_pathname_utf8(struct archive_entry *, const char *);
Unfortunately, none of them seem to work.
In particular, I have tried:
const char* myUtf8Str = ...
archive_entry_update_pathname_utf8(entry, myUtf8Str);
// this sounded like the most straightforward solution
and
const wchar_t* myUtf16Str = ...
archive_entry_copy_pathname_w(entry, myUtf16Str);
// UTF-16 encoded strings seem to be the default on Windows
In both cases, the resulting zip archive does not show the file names correctly in both Windows Explorer and 7-Zip.
I am certain that my input strings are encoded correctly, since I convert them from Qt QString
instances that work perfectly well in other parts of my code:
const char* myUtf8Str = filename.toUtf8().constData();
const wchar_t* myUtf16Str = filename.toStdWString().c_str();
For instance, this works even for another call to libarchive, when creating the zip file:
archive_write_open_filename_w(archive, zipFile.toStdWString().c_str());
// creates a zip archive file where the non-ASCII
// chars are encoded correctly, e.g. äöü.zip
I have also tried to change the options for libarchive, as suggested by this example:
archive_write_set_options(a, "hdrcharset=UTF-8");
But this call fails, so I assume that I have to set some other option, but I'm running out of ideas...
UPDATE 2
I have done some more reading about the zip format. It allows writing file names in UTF-8, such that OS X / Linux / Windows 8 / 7-Zip / WinZip will always decode them correctly, see e.g. here.
This is what I want to achieve using libarchive, i.e. I would like to pass it my UTF-8 encoded pathname
and have it store that in the zip file without doing any conversion.
I have added the "set locale" approach as an (unsatisfying) answer.
setlocale(LC_ALL, "");
– Cleopatracleopatre