Is there any way to perform diff operetion on two files in two zips without extracting them? If not - any other workaround to compare them without extracting?
Thanks.
Is there any way to perform diff operetion on two files in two zips without extracting them? If not - any other workaround to compare them without extracting?
Thanks.
Combining the responses so far, the following bash function will compare the file listings from the zip files. The listings include verbose output (unzip -v
), so checksums can be compared. Output is sorted by filename (sort -k8
) to allow side by side comparison and the diff output expanded (W200
) so the filenames are visible in the side by side view.
function zipdiff() { diff -W200 -y <(unzip -vql "$1" | sort -k8) <(unzip -vql "$2" | sort -k8); }
This can be added to your ~/.bashrc
file to be used from any console. It can be used with zipdiff a.zip b.zip
. Piping the output to less
or redirecting to a file is helpful for large zip files.
--suppress-common-lines
, as suggested in another comment below. –
Denyse function zipcdiff() { A='{printf("%8sB %s %s\n",$1,$7,$8)}'; diff <(unzip -vqql "$1" | awk "$A" | sort -k3) <(unzip -vqql "$2" | awk "$A" | sort -k3); }
. Output is empty when contents are equal. Useful for checking deterministic builds. –
Gilbert unzip -l
will list the contents of a zip file. You can then pass that to diff
in the normal manner as mentioned here: https://askubuntu.com/questions/229447/how-do-i-diff-the-output-of-two-commands
So for example if you had two zip files:
foo.zip
bar.zip
You could run diff -y <(unzip -l foo.zip) <(unzip -l bar.zip)
to do a side-by-side diff of the contents of the two files.
Hope that helps!
--suppress-common-lines
flag to display only the lines that differ worked out really well for me: diff -y <(unzip -l foo.zip) <(unzip -l bar.zip) --suppress-common-lines
–
Larousse function zipdiff() { diff -y <(unzip -l $1) <(unzip -l $2) --suppress-common-lines; }
, and that worked flawlessly for what I was trying to do. –
Borchert The command to diff 2 zipfiles (a.zip
and b.zip
) is
diff \
<(unzip -vqq a.zip | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3 -f) \
<(unzip -vqq b.zip | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3 -f)
I was looking for a way to compare the contents of the files stored in the zipfile, but not other metadata. Consider the following:
$ echo foo > foo.txt
$ zip now.zip foo.txt
adding: foo.txt (stored 0%)
$ zip later.zip foo.txt
adding: foo.txt (stored 0%)
$ diff now.zip later.zip
Binary files now.zip and later.zip differ
Conceptually, this makes no sense; I ran the same command on the same inputs and got 2 different outputs! The difference is the metadata, which stores the date the file was added!
$ unzip -v now.zip
Archive: now.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
4 Stored 4 0% 04-08-2020 23:27 7e3265a8 foo.txt
-------- ------- --- -------
4 4 0% 1 file
$ unzip -v later.zip
Archive: later.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
4 Stored 4 0% 04-08-2020 23:28 7e3265a8 foo.txt
-------- ------- --- -------
4 4 0% 1 file
Note: I manually edited the time of the second file here from
23:27
to23:28
for clarity. The field in the file itself stores the value of seconds (which, in my case, differed -- a binary diff would still fail) even though they are not represented in the command's output.
So to diff the files only, we must ignore the date fields. unzip -vqq
will get us a better summary:
$ unzip -vqq now.zip
4 Stored 4 0% 04-08-2020 23:27 7e3265a8 foo.txt
So let's mask out the fields (we don't care about dates or compression metrics) and sort the files:
$ unzip -vqq now.zip | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3 -f
4 7e3265a8 foo.txt
I wanted the actual diff between the files in the zips in a readable format. Here is a bash function that I wrote for this purpose which makes use of git. This has a good UX if you already use git as part of your normal workflow and can read git diffs.
# usage: zipdiff before.zip after.zip
function zipdiff {
current=$(pwd)
before="$current/$1"
after="$current/$2"
tempdir=$(mktemp -d)
cd "$tempdir"
git init &> /dev/null
unzip -qq "$before" *
git add . &> /dev/null
git commit -m "before" &> /dev/null
rm -rf "$tempdir/*"
yes | unzip -qq "$after" * &> /dev/null
git add .
git diff --cached
cd "$current"
rm -rf "$tempdir"
}
If you want to diff
two files (as in see the difference) you have to extract them - even if only to memory!
In order to see the diff of two files in two zips you can do something like this (no error checking or whatsoever):
# define a little bash function
function zipdiff () { diff -u <(unzip -p $1 $2) <(unzip -p $3 $4); }
# test it: create a.zip and b.zip, each with a different file.txt
echo hello >file.txt; zip a.zip file.txt
echo world >file.txt; zip b.zip file.txt
zipdiff a.zip file.txt b.zip file.txt
--- /dev/fd/63 2016-02-23 18:18:09.000000000 +0100
+++ /dev/fd/62 2016-02-23 18:18:09.000000000 +0100
@@ -1 +1 @@
-hello
+world
Note: unzip -p
extracts files to pipe (stdout).
If you only want to know if the files are different you can inspect their checksums using
unzip -v -l zipfile [file_to_inspect]
Note: -v
means verbose and -l
list contents)
unzip -v -l a.zip
Archive: a.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
6 Stored 6 0% 2016-02-23 18:23 363a3020 file.txt
-------- ------- --- -------
6 6 0% 1 file
unzip -v -l b.zip
Archive: b.zip
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
6 Stored 6 0% 2016-02-23 18:23 dd3861a8 file.txt
-------- ------- --- -------
6 6 0% 1 file
In the example above you can see that the checksums (CRC-32) are different.
You might also be interested in this project: https://github.com/nhnb/zipdiff
By postprocessing the output of zipcmp
, you can recurse through the archives to obtain a more detailed summary of the differences between them.
#!/bin/bash
# process zipcmp's output to do true diffs of archive contents
# 1. grep removes the '+++' and '---' from zipcmp's output
# 2. awk prints the final column of output
# 3. sort | uniq to dedupe
for badfile in $(zipcmp ${1?No first zip} ${2?No second zip} \
| grep -Ev '^[+-]{3}' \
| awk '{print $NF}' \
| sort | uniq);
do
echo "diffing $badfile"
diff <(unzip -p $1 $badfile) <(unzip -p $2 $badfile) ;
done;
If you need just to check if files are equal you can compare CRC32 checksums, which are stored in archive local header fields/central directory.
Web-tools such as https://www.diffnow.com/compare-files offer a quite nice visual information which files in the zip have changed:
This works very convenient for not too big zip-files without the need to install anything. This works not only for Linux but also for other operating systems including Windows and Mac.
The tools discussed in the other answers offer obviously more advanced options and can be faster for larger zip files.
Some command line tools exists:
I am an happy user of diffzips.pl to compare the content of epub files. diffzips.pl has also the advantage to be recursive, comparing zip file inside the parent zip.
© 2022 - 2024 — McMap. All rights reserved.
sha512 filename1
andsha512 filename2
and see if the output is the same. – Beachhead