How to determine compression method of a ZIP/RAR file
Asked Answered
B

7

26

I have a few zip and rar files that I'm working with, and I'm trying to analyze the properties of how each file was compressed (compression level, compression algorithm (e.g. deflate, LZMA, BZip2), dictionary size, word size, etc.), and I haven't figured out a way to do this yet.

Is there any way to analyze the files to determine these properties, with software or otherwise?

Cheers and thanks!

Ball answered 1/8, 2011 at 9:27 Comment(0)
S
6

I suggest hachoir-wx to have a look at these files. How to install a Python package or you can try ActivePython with PyPM when using Windows. When you have the necessary hachoir packages installed, you can do something like this to run the GUI:

python C:\Python27\Scripts\hachoir-wx

It enables you to browse through the data fields of RAR and ZIP files. See this screenshot for an example.

For RAR files, have a look at the technote.txt file that is in the WinRAR installation directory. This gives detailed information of the RAR specification. You will probably be interested in these:

 HEAD_FLAGS      Bit flags: 2 bytes
                 0x10 - information from previous files is used (solid flag)
                 bits 7 6 5 (for RAR 2.0 and later)
                      0 0 0    - dictionary size   64 KB
                      0 0 1    - dictionary size  128 KB
                      0 1 0    - dictionary size  256 KB
                      0 1 1    - dictionary size  512 KB
                      1 0 0    - dictionary size 1024 KB
                      1 0 1    - dictionary size 2048 KB
                      1 1 0    - dictionary size 4096 KB
                      1 1 1    - file is directory

Dictionary size can be found in the WinRAR GUI too.

 METHOD          Packing method 1 byte
                 0x30 - storing
                 0x31 - fastest compression
                 0x32 - fast compression
                 0x33 - normal compression
                 0x34 - good compression
                 0x35 - best compression

And Wikipedia also knows this:

The RAR compression utility is proprietary, with a closed algorithm. RAR is owned by Alexander L. Roshal, the elder brother of Eugene Roshal. Version 3 of RAR is based on Lempel-Ziv (LZSS) and prediction by partial matching (PPM) compression, specifically the PPMd implementation of PPMII by Dmitry Shkarin.

For ZIP files I would start by having a look at the specifications and the ZIP Wikipedia page. These are probably interesting:

  general purpose bit flag: (2 bytes)
  compression method: (2 bytes)
Stace answered 2/8, 2011 at 15:32 Comment(1)
The METHOD byte for the first file is typically found at offset 0x2D.Keister
R
19

This is a fairly old question, but I wanted to throw in my two cents anyway since some of the methods above weren't as easy for me to use.

You can also determine this with 7-Zip. After opening the archive there is a column for method of compression:

7zip properties

Rowel answered 2/7, 2012 at 15:6 Comment(1)
Windows File Explorer also has this column available; though you may need to add it once the zip is open... right-click the table heading and ensure the 'Method' option is ticked / checked.Salvucci
B
12

For ZIP - yes, zipinfo

For RAR, the headers are easily found with either 7Zip or WinRAR, read the attached documentation

Behalf answered 25/8, 2011 at 15:15 Comment(1)
Thanks for your hint! I needed to find out what ZIP setting MS Word uses, when it generates .DOCX files. A DOCX file is a ZIP archive, containing several XML files and your embedded media files. These you can batch process with the tools of your choice, but then at the end you need to repack it to a ZIP file with settings which MS Word accepts! I used zipinfo to analyze the DOCX files MS Word had written. Should I come a final conclusion about the DOCX ZIP format, I will post it here.Norvan
D
8

Via 7-Zip (or p7zip) command line:

7z l -slt archive.file

If looking specifically for the compression method:

7z l -slt archive.file | grep -e '^---' -e '^Path =' -e '^Method ='
Dacron answered 22/2, 2018 at 7:8 Comment(1)
I also could have used grep -E '^((---)|(Path =)|(Method =))'.Dacron
S
6

I suggest hachoir-wx to have a look at these files. How to install a Python package or you can try ActivePython with PyPM when using Windows. When you have the necessary hachoir packages installed, you can do something like this to run the GUI:

python C:\Python27\Scripts\hachoir-wx

It enables you to browse through the data fields of RAR and ZIP files. See this screenshot for an example.

For RAR files, have a look at the technote.txt file that is in the WinRAR installation directory. This gives detailed information of the RAR specification. You will probably be interested in these:

 HEAD_FLAGS      Bit flags: 2 bytes
                 0x10 - information from previous files is used (solid flag)
                 bits 7 6 5 (for RAR 2.0 and later)
                      0 0 0    - dictionary size   64 KB
                      0 0 1    - dictionary size  128 KB
                      0 1 0    - dictionary size  256 KB
                      0 1 1    - dictionary size  512 KB
                      1 0 0    - dictionary size 1024 KB
                      1 0 1    - dictionary size 2048 KB
                      1 1 0    - dictionary size 4096 KB
                      1 1 1    - file is directory

Dictionary size can be found in the WinRAR GUI too.

 METHOD          Packing method 1 byte
                 0x30 - storing
                 0x31 - fastest compression
                 0x32 - fast compression
                 0x33 - normal compression
                 0x34 - good compression
                 0x35 - best compression

And Wikipedia also knows this:

The RAR compression utility is proprietary, with a closed algorithm. RAR is owned by Alexander L. Roshal, the elder brother of Eugene Roshal. Version 3 of RAR is based on Lempel-Ziv (LZSS) and prediction by partial matching (PPM) compression, specifically the PPMd implementation of PPMII by Dmitry Shkarin.

For ZIP files I would start by having a look at the specifications and the ZIP Wikipedia page. These are probably interesting:

  general purpose bit flag: (2 bytes)
  compression method: (2 bytes)
Stace answered 2/8, 2011 at 15:32 Comment(1)
The METHOD byte for the first file is typically found at offset 0x2D.Keister
H
2

The zipfile python module can be used to get info about the zipfile. The ZipInfo class provides information like filename, compress_type, compress_size, file_size etc...

Python snippet to get filename and the compress type of files in a zip archive

import zipfile

with zipfile.ZipFile(path_to_zipfile, 'r') as zip:
    for info in zip.infolist():
        print(f'filename: {info.filename}')
        print(f'compress type: {info.compress_type}')

This would list all the filenames and their corresponding compression type(integer), which can be used to look up the compression method.
You can get a lot more info about the files using infolist().

The python module linked in the accepted answer is not available, zipfile module might help

Heidy answered 27/11, 2020 at 10:28 Comment(0)
L
1

For the ZIP files, there is a command zipinfo.

Ladon answered 2/8, 2011 at 15:43 Comment(3)
When I enter that into my console, it says that no such command was found.Turkoman
sudo apt install unzipGoodbye
Version for Windows.Novellanovello
C
0

The type is easy, just look at the file headers (PK and Rar).

As for the rest, I doubt that information is available in the compressed content.

Cologarithm answered 1/8, 2011 at 9:43 Comment(2)
Yes, it is available (at least for rar). But how to get it obviously depends on the specific file format.Aforementioned
If the information wasn't available, it wouldn't be possible to decompress the data.Uno

© 2022 - 2024 — McMap. All rights reserved.