Zip files contain same files but have different hashes?
Asked Answered
L

3

18

I have created hundreds of folders and text files using php, I then add them to a zip archive.

This all works fine but if I create another zip archive using the same folders and files, the new archive will have a different hash to the first one. This is the same if I use winrar instead of php to create an archive.

It only seems to show different hashes when I zip the files I have created through php, yet they open fine.

Very strange anyone shed any light on this?

Thanks

Loaded answered 22/7, 2012 at 20:9 Comment(4)
I'm guessing, maybe a different created timestamp which is part of the zip file ?Bengt
@orn The files are untouched, I can create 2 zips one after the other and it would be the same.Loaded
@arbme, no he's saying maybe there is a timestamp in the created zipfile. Since you didn't create them at the same time, they would be different.Sealy
I thought timestamp of the file wasnt taken into count just the contents. It seems if you dont add the files in the same order you will get a different hash, even if the contents are the same.Loaded
S
8

There is certainly some difference in the files. If the lengths are not exactly the same, the hash will be different. You can use a comparing hex editor, like Hex Workshop for example, to see what exactly the differences are.

Possibilities that come to my mind:

  1. As @orn mentioned, there may be a timestamp in the zip format you are using (not sure).
  2. The order that the files are added to the archive may be different (depending on how you're selecting them / building the source array).
Sealy answered 22/7, 2012 at 20:25 Comment(2)
That's wrong, zip will always be different unless forcing internal creation and modification time #9714639Thighbone
Tell me what specifically was wrong about my answer.Sealy
I
25

Zip is not deterministic. To solve this zip problem (it's really problem when you have CI and need to update AWS lambda, for example and don't want to update it each time, but only when something was really changed) I used this article: https://medium.com/@pat_wilson/building-deterministic-zip-files-with-built-in-commands-741275116a19
Like this:

find . -exec touch -t "$(git ls-files -z . | \
  xargs -0 -n1 -I{} -- git log -1 --date=format:"%Y%m%d%H%M" --format="%ad" '{}' | \
  sort -r | head -n 1)" '{}' +
zip -rq -D -X -9 -A --compression-method deflate dest.zip sources...
Infeld answered 11/5, 2020 at 18:57 Comment(2)
This is exacty my use case. Thank you so much!Finial
amazing article that helped a lotFlo
S
8

There is certainly some difference in the files. If the lengths are not exactly the same, the hash will be different. You can use a comparing hex editor, like Hex Workshop for example, to see what exactly the differences are.

Possibilities that come to my mind:

  1. As @orn mentioned, there may be a timestamp in the zip format you are using (not sure).
  2. The order that the files are added to the archive may be different (depending on how you're selecting them / building the source array).
Sealy answered 22/7, 2012 at 20:25 Comment(2)
That's wrong, zip will always be different unless forcing internal creation and modification time #9714639Thighbone
Tell me what specifically was wrong about my answer.Sealy
M
0

You can consider using deterministic_zip it solves this issue, from its documentation:

There are three tricks to building a deterministic zip:

Files must be added to the zip in the same order. Directory iteration order may vary across machines, resulting in different zips. deterministic_zip sorts all files before adding them to the zip archive. Files in the zip must have consistent timestamps. If I share a directory to another machine, the timestamps of individual files may differ, despite having identical content. To achieve timestamp consistency, deterministic_zip sets the timestamp of all added files to 2019-01-01 00:00:00.

Files in the zip must have consistent permissions. File permissions look like -rw-r--r-- for a file that is readable by all users, and only writable by the user who owns the file. Similarly executable files might have permissions that look like: -rwxr-xr-x or -rwx------. deterministic_zip sets the permission of all files added to the archive to either -r--r--r--, or -r-xr-xr-x. The latter is only used when the user running deterministic_zip has execute access on the file.

Note: deterministic_zip does not modify nor update timestamps of any files it adds to archives. The techniques used above apply only to the copies of files within archives deterministic_zip creates.

Mammalogy answered 3/3, 2022 at 1:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.