Are hard links possible within a zip archive?
Asked Answered
F

3

12

I am creating a zip archive containing two identical files at different paths. Does the zip archive format support something akin to the Unix concept of hard links? By this I mean the ability to store the file only once (saving space), but to index that data blob to two different paths within the zip archive.

If the file format does support this, how could I go about creating such an archive using free tools in Ubuntu?

Fears answered 14/1, 2012 at 2:12 Comment(0)
B
9

No, the Zip file format does not support this. This is because the Local File Header contains information about the file, including its name, followed immediately by the compressed data for the file. It is not possible for two different Local File Headers to point to the same compressed data.

Bacteriostasis answered 14/1, 2012 at 2:16 Comment(2)
Thanks -- I suspected as much, but was hoping there might be a trick I hadn't thought of.Fears
According to pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT, in the UNIX extra field, the variable length data field can contain information about "symbolic or hard links". This suggests that some implementations could support that. Don't you think?Bifurcate
D
13

As @Greg said, ZIP doesn't support hardlinks.

But if I understand right, your purpose is to reduce size of the compressed archive. So let's think of an alternative solution.

Lets run simple tests to check compression ratio of different archive libraries. I've created two identical binary files and compressed them using ZIP, BZ2, RAR and 7z.

  8641969  test.bin
  8641969  test2.bin

First time only one file was compressed. Second time two files were compressed:

ZIP:

$zip -9 test1.zip test.bin
$zip -9 test2.zip test.bin test2.bin

 8636837  test1.zip
17273654  test2.zip 

BZIP2:

$export BZIP=--fast
$tar cjf test1.tbz test.bin 
$tar cjf test2.tbz test.bin test2.bin 

 8694997  test1.tbz
17389167  test2.tbz

7z:

$7z a -mx=9 test1.7z test.bin 
$7z a -mx=9 test2.7z test.bin test2.bin 

 8705285  test1.7z
 8707054  test2.7z

RAR:

$rar a -m5 test1.rar test.bin
$rar a -m5 test2.rar test.bin test2.bin 

 8649970  test1.rar
17299916  test2.rar

Conclusion: It seems that only 7z does the job good. Consider using it in your application.

Of course you will need to do more tests at your environment with your files to see if it really what you need. Also you can play with options to see on which level of compression you get the best compression ratio/speed balance.

Draft answered 14/1, 2012 at 3:6 Comment(2)
How is it doing the better job? Is the larger the number the better?Schrecklichkeit
@Schrecklichkeit the number is the size of the compressed file, so the smaller size of archive is the betterDraft
B
9

No, the Zip file format does not support this. This is because the Local File Header contains information about the file, including its name, followed immediately by the compressed data for the file. It is not possible for two different Local File Headers to point to the same compressed data.

Bacteriostasis answered 14/1, 2012 at 2:16 Comment(2)
Thanks -- I suspected as much, but was hoping there might be a trick I hadn't thought of.Fears
According to pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT, in the UNIX extra field, the variable length data field can contain information about "symbolic or hard links". This suggests that some implementations could support that. Don't you think?Bifurcate
D
3

tar archives support hard links

Discovert answered 29/7, 2012 at 19:37 Comment(1)
This answer is waaay too terse. Back up your assertion with links and some background information.Hally

© 2022 - 2024 — McMap. All rights reserved.