diff files inside of zip without extracting it [closed]
Asked Answered
S

10

23

Is there any way to perform diff operetion on two files in two zips without extracting them? If not - any other workaround to compare them without extracting?

Thanks.

Shetrit answered 23/2, 2016 at 15:17 Comment(3)
Do you only want to know if the two files differ or do you want to get a visual diff ?Upstanding
If you want to know whether they are different then use sha512 filename1 and sha512 filename2 and see if the output is the same.Beachhead
related: https://mcmap.net/q/223135/-can-git-treat-zip-files-as-directories-and-files-inside-the-zip-as-blobsHeterochromatin
L
13

Combining the responses so far, the following bash function will compare the file listings from the zip files. The listings include verbose output (unzip -v), so checksums can be compared. Output is sorted by filename (sort -k8) to allow side by side comparison and the diff output expanded (W200) so the filenames are visible in the side by side view.

function zipdiff() { diff -W200 -y <(unzip -vql "$1" | sort -k8) <(unzip -vql "$2" | sort -k8); }

This can be added to your ~/.bashrc file to be used from any console. It can be used with zipdiff a.zip b.zip. Piping the output to less or redirecting to a file is helpful for large zip files.

Lylelyles answered 26/5, 2017 at 9:26 Comment(2)
Very helpful, thanks, I found it was made even better by adding --suppress-common-lines, as suggested in another comment below.Denyse
If you want to ignore date differences, comparing just (hash,size,path): function zipcdiff() { A='{printf("%8sB %s %s\n",$1,$7,$8)}'; diff <(unzip -vqql "$1" | awk "$A" | sort -k3) <(unzip -vqql "$2" | awk "$A" | sort -k3); }. Output is empty when contents are equal. Useful for checking deterministic builds.Gilbert
P
8

unzip -l will list the contents of a zip file. You can then pass that to diff in the normal manner as mentioned here: https://askubuntu.com/questions/229447/how-do-i-diff-the-output-of-two-commands

So for example if you had two zip files:

foo.zip
bar.zip

You could run diff -y <(unzip -l foo.zip) <(unzip -l bar.zip) to do a side-by-side diff of the contents of the two files.

Hope that helps!

Philosophical answered 23/2, 2016 at 15:42 Comment(3)
Adding the --suppress-common-lines flag to display only the lines that differ worked out really well for me: diff -y <(unzip -l foo.zip) <(unzip -l bar.zip) --suppress-common-linesLarousse
I ended up with function zipdiff() { diff -y <(unzip -l $1) <(unzip -l $2) --suppress-common-lines; }, and that worked flawlessly for what I was trying to do.Borchert
This won't detect a change to an existing file that happens to leave it at the same size. -vql instead of -l prints the checksums, but these are CRC32 (meaning they won't reliably detect intentional tampering the way a cryptographic hash function will).Nunn
W
7

Compressed File Contents Only

TL;DR

The command to diff 2 zipfiles (a.zip and b.zip) is

diff \
  <(unzip -vqq a.zip  | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3 -f) \
  <(unzip -vqq b.zip  | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3 -f)

Explanation

I was looking for a way to compare the contents of the files stored in the zipfile, but not other metadata. Consider the following:

$ echo foo > foo.txt
$ zip now.zip foo.txt
  adding: foo.txt (stored 0%)
$ zip later.zip foo.txt
  adding: foo.txt (stored 0%)
$ diff now.zip later.zip 
Binary files now.zip and later.zip differ

Conceptually, this makes no sense; I ran the same command on the same inputs and got 2 different outputs! The difference is the metadata, which stores the date the file was added!

$ unzip -v now.zip 
Archive:  now.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       4  Stored        4   0% 04-08-2020 23:27 7e3265a8  foo.txt
--------          -------  ---                            -------
       4                4   0%                            1 file
$ unzip -v later.zip
Archive:  later.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       4  Stored        4   0% 04-08-2020 23:28 7e3265a8  foo.txt
--------          -------  ---                            -------
       4                4   0%                            1 file

Note: I manually edited the time of the second file here from 23:27 to 23:28 for clarity. The field in the file itself stores the value of seconds (which, in my case, differed -- a binary diff would still fail) even though they are not represented in the command's output.

So to diff the files only, we must ignore the date fields. unzip -vqq will get us a better summary:

$ unzip -vqq now.zip
       4  Stored        4   0% 04-08-2020 23:27 7e3265a8  foo.txt

So let's mask out the fields (we don't care about dates or compression metrics) and sort the files:

$ unzip -vqq now.zip  | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3 -f
4      7e3265a8 foo.txt
Withoutdoors answered 9/4, 2020 at 3:47 Comment(1)
Brilliant, exactly what I was looking for!Hyperventilation
C
6

I wanted the actual diff between the files in the zips in a readable format. Here is a bash function that I wrote for this purpose which makes use of git. This has a good UX if you already use git as part of your normal workflow and can read git diffs.

# usage: zipdiff before.zip after.zip
function zipdiff {
  current=$(pwd)
  before="$current/$1"
  after="$current/$2"
  tempdir=$(mktemp -d)
  cd "$tempdir"
  git init &> /dev/null
  unzip -qq "$before" *
  git add . &> /dev/null
  git commit -m "before" &> /dev/null
  rm -rf "$tempdir/*"  
  yes | unzip -qq "$after" * &> /dev/null
  git add .
  git diff --cached
  cd "$current"
  rm -rf "$tempdir"
}

Chesterfieldian answered 9/4, 2019 at 22:0 Comment(0)
U
5

If you want to diff two files (as in see the difference) you have to extract them - even if only to memory!

In order to see the diff of two files in two zips you can do something like this (no error checking or whatsoever):

# define a little bash function
function zipdiff () { diff -u <(unzip -p $1 $2) <(unzip -p $3 $4); }

# test it: create a.zip and b.zip, each with a different file.txt
echo hello >file.txt; zip a.zip file.txt
echo world >file.txt; zip b.zip file.txt

zipdiff a.zip file.txt b.zip file.txt
--- /dev/fd/63  2016-02-23 18:18:09.000000000 +0100
+++ /dev/fd/62  2016-02-23 18:18:09.000000000 +0100
@@ -1 +1 @@
-hello
+world

Note: unzip -p extracts files to pipe (stdout).

If you only want to know if the files are different you can inspect their checksums using

unzip -v -l zipfile [file_to_inspect]

Note: -v means verbose and -llist contents)

unzip -v -l a.zip 
Archive:  a.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       6  Stored        6   0% 2016-02-23 18:23 363a3020  file.txt
--------          -------  ---                            -------
       6                6   0%                            1 file

unzip -v -l b.zip 
Archive:  b.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       6  Stored        6   0% 2016-02-23 18:23 dd3861a8  file.txt
--------          -------  ---                            -------
       6                6   0%                            1 file 

In the example above you can see that the checksums (CRC-32) are different.

You might also be interested in this project: https://github.com/nhnb/zipdiff

Upstanding answered 23/2, 2016 at 17:35 Comment(0)
D
1

By postprocessing the output of zipcmp, you can recurse through the archives to obtain a more detailed summary of the differences between them.

#!/bin/bash

# process zipcmp's output to do true diffs of archive contents
# 1. grep removes the '+++' and '---' from zipcmp's output
# 2. awk prints the final column of output
# 3. sort | uniq to dedupe
for badfile in $(zipcmp ${1?No first zip} ${2?No second zip} \
    | grep -Ev '^[+-]{3}' \
    | awk '{print $NF}' \
    | sort | uniq);
do
    echo "diffing $badfile"
    diff <(unzip -p $1 $badfile) <(unzip -p $2 $badfile) ;
done;

Decapolis answered 2/7, 2020 at 23:13 Comment(0)
F
0

If you need just to check if files are equal you can compare CRC32 checksums, which are stored in archive local header fields/central directory.

Feminism answered 23/2, 2016 at 17:52 Comment(0)
J
0

The comp_zip tool in the open-source library Zip-Ada (available here or here) performs a comparison without extraction: contents, files of a.zip missing in b.zip and integrity check of both.

Jernigan answered 14/5, 2020 at 19:8 Comment(0)
C
0

Web-tools such as https://www.diffnow.com/compare-files offer a quite nice visual information which files in the zip have changed:

enter image description here

This works very convenient for not too big zip-files without the need to install anything. This works not only for Linux but also for other operating systems including Windows and Mac.

The tools discussed in the other answers offer obviously more advanced options and can be faster for larger zip files.

Crone answered 30/11, 2020 at 12:52 Comment(0)
P
0

Some command line tools exists:

  1. diffzips.pl written in Perl.
  2. zipdiff written in Java.
  3. zipdiff port to .NET of the previous one.
  4. zipcmp written in C, from libzip library
  5. zcmp and zdiff from gzip, can be used on zip files.

I am an happy user of diffzips.pl to compare the content of epub files. diffzips.pl has also the advantage to be recursive, comparing zip file inside the parent zip.

Puppis answered 10/1, 2021 at 10:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.