Zip file with utf-8 file names
Asked Answered
T

4

7

In my website i have option to download all images uploaded by users. The problem is in images with hebrew names (i need original name of file). I tried to decode file names but this is not helping. Here is a code :

using ICSharpCode.SharpZipLib.Zip;

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
byte[] utfBytes = utf8.GetBytes(file.Name);
byte[] isoBytes = Encoding.Convert(utf8, iso, utfBytes);
string name = iso.GetString(isoBytes);

var entry = new ZipEntry(name + ".jpg");
zipStream.PutNextEntry(entry);
using (var reader = new System.IO.FileStream(file.Name, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    byte[] buffer = new byte[ChunkSize];
    int bytesRead;
    while ((bytesRead = reader.Read(buffer, 0, buffer.Length)) > 0)
    {
        byte[] actual = new byte[bytesRead];
        Buffer.BlockCopy(buffer, 0, actual, 0, bytesRead);
        zipStream.Write(actual, 0, actual.Length);
    }
} 

After utf-8 encoding i get hebrew file names like this : ??????.jpg Where is my mistake?

Terrarium answered 20/12, 2012 at 8:18 Comment(2)
what is a new ZipEntry? im not aware of this being part of the new System.IO.Compression namespace. j#?Lothar
this is ICSharpCode.SharpZipLib.Zip libraryTerrarium
P
2

Unicode (UTF-8 is one of the binary encoding) can represent more characters than the other 8-bit encoding. Moreover, you are not doing a proper conversion but a re-interpretation, which means that you get garbage for your filenames. You should really read the article from Joel on Unicode.

...

Now that you've read the article, you should know that in C# string can store unicode data, so you probably don't need to do any conversion of file.Name and can pass this directly to ZipEntry constructor if the library does not contains encoding handling bugs (this is always possible).

Paint answered 20/12, 2012 at 8:29 Comment(1)
Hi. Thanks for reply and for article. If i not execute the encoding block i have file names in my zip like this : ëàâàüëò Çàîé_1.jpgTerrarium
G
1

Try using

ZipStrings.UseUnicode = true;

It should be a part of the ICSharpCode.SharpZipLib.Zip namespace.

After that you can use something like

var newZipEntry = new ZipEntry($"My ünicödë string.pdf");

and add the entry as normal to the stream. You shouldn't need to do any conversion of the string before that in C#.

Gypsum answered 24/11, 2019 at 20:50 Comment(0)
H
0

You are doing wrong conversion, since strings in C# are already unicode. What tools do you use to check file names in archive? By default Windows ZIP implementations use system DOS encoding for file names, while other implementations can use other encoding.

Hymnal answered 24/12, 2012 at 10:13 Comment(0)
S
0

This worked for me:

ZipEntry entry = new ZipEntry(Path.GetFileName(fileName)) { IsUnicodeText = true };
Sooth answered 2/2 at 18:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.