"IllegalArgumentException: UNMAPPABLE[1]" while zipping a file with Greek characters
Asked Answered
L

3

9

I want to zip a file on Windows (7) with ZipOutputStream. The Problem is that the file name (and file file Content too) contains also Greek characters ("ГП0000660040140521_a.txt", Gamma and Pi). The code to zip the file I use:

ZipOutputStream zipOs = new ZipOutputStream(
    new FileOutputStream("c:\\temp\\test.zip"), Charset.forName("cp737")
);

File sourceFile = new File("C:/Path/To/File/ГП0000660040140521_b.txt");
String entryName = sourceFile.getName().replaceAll("\\\\", "/");
ZipEntry entry = new ZipEntry(entryName);
zipOs.putNextEntry(entry);
...
...

But on the last line (the putNextEntry call) I get a IllegalArgumentException:

java.lang.IllegalArgumentException: UNMAPPABLE[1]
at java.util.zip.ZipCoder.getBytes(ZipCoder.java:95)
at java.util.zip.ZipOutputStream.writeLOC(ZipOutputStream.java:407)
at java.util.zip.ZipOutputStream.putNextEntry(ZipOutputStream.java:221)

I assume there must be anything wrong with the character mapping between Greek and UTF-8 ... Whats the right way to zip a file with Greek characters in the file Name?

EDIT

If I use "utf-8" as character set the zip file can be created, but the name of the zipped file is wrong: "ðôðƒ0000660040140521_a.txt" (the Greek characters are missing)

Linkboy answered 21/5, 2014 at 12:51 Comment(6)
What does createZipEntry method do?Cosper
@agad: My mistake! I have corrected the code. Thanks.Linkboy
When you say the name is "wrong", how exactly are you verifying this? Which tool are you using to inspect the ZIP file, and are you sure that that tool is using the same encoding to interpret the file names as you used when you created them?Buddhi
I open it with the file Explorer of Windows. Do you mean, that the file Name is correct, but the Windows Explorer shows it wrong? But why the Windows File Explorer shows the Creek characters of the original file correct?Linkboy
I have the same problem encoding a zip entry with character 'Č' in the file name. I have been using almost identical code and charset cp852. Any ideas what is causing the problem mapping the character from utf-8 to cp852?Petit
I cannot replicate the problem when using UTF-8 in the Java code. I tried on a German Windows (cp850), creating a file όνομα_αρχείου.txt and then zipping it from Java 8. The file name is correct inside the ZIP file, I only have problems when using something other than UTF-8 in the code. So I need more information about how exactly you do that.Benedicto
P
1

Since ZipCoder used by ZipOutputStream uses a mapper configured to always throw an exception whenever the character cannot be mapped, I ended up converting the entryName to the specified character set first by myself and just then by calling ZipEntry entry = new ZipEntry(entryName). You can do it for example this way:

new String(input.getBytes(charset), charset)

This ensures that all unmpabble characters are converted to replacement characters and no exception is given.

Try this and you will probably notice some Unicode control characters (which are unmappable) in the original input.

Petit answered 3/3, 2017 at 13:36 Comment(1)
Indeed a last resort, though according to en.wikipedia.org/wiki/ZIP_(file_format)#Structure the zip format can do Unicode encoded as UTF-8 since 2006. See also here: superuser.com/a/1507988 . Or check the code of ZipOutputStream for e.flag |= USE_UTF8. If this does fail to decode, it is foremost the decoder's fault, which it was in my case leading to a journey which ends here now :-)Beelzebub
L
0

I wrote this (late) answer because of the comments of "miso" and "kriegax" of my question.

If I remember right I have read anywhere that the UTF8 support of filenames in zip files is one of the great weak points of zip files (because UTF-8 is not official supported by zip standard?!?). May be now it existing zip applications which do support UTF-8 in file names.

However. In our case it was ok for us to replace the greek characters with "normal" characters ("a...z"), because the files to zip was generated by a fiscal printer and the contains in every case only a one greek character: a "PI" (only a workaround ...).

Linkboy answered 27/2, 2017 at 7:7 Comment(1)
The actual problem with ZIP is that it does not carry any meta information about the actual encoding (charset) used to encode the file names.Petit
S
0

The problem is, that CP-737 is indeed a code page which contains greek characters, however in Java NIO the name of the character set is x-IBM737. Cf. http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html

Stole answered 3/3, 2017 at 12:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.