Java 7zip compression is too big
Asked Answered
G

3

8

I have a Java program which searches for a folder with the date of yesterday and compresses it to a 7zip file and deletes it at the end. Now I have noticed that the generated 7zip archive files by my program are way too big. When I use a program like 7-Zip File Manager to compress my files it generates an archive which is 5 kb big while my program generates an archive which is 737 kb big for the same files (which have a 873 kb size). Now I am afraid that my program does not compress it to a 7zip file but do a usual zip file. Is there a way to change something in my code so that it generates a smaller 7zip file like 7-Zip File Manager would do it?

package SevenZip;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.concurrent.TimeUnit;

import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZOutputFile;

public class SevenZipUtils {

    public static void main(String[] args) throws InterruptedException, IOException {

        String sourceFolder = "C:/Users/Ferid/Documents/Dates/";
        String outputZipFile = "/Users/Ferid/Documents/Dates";
        int sleepTime = 0;
        compress(sleepTime, outputZipFile, sourceFolder);
    }

    public static boolean deleteDirectory(File directory, int sleepTime) throws InterruptedException {
        if (directory.exists()) {
            File[] files = directory.listFiles();
            if (null != files) {
                for (int i = 0; i < files.length; i++) {
                    if (files[i].isDirectory()) {
                        deleteDirectory(files[i], sleepTime);
                        System.out.println("Folder deleted: " + files[i]);
                    } else {
                        files[i].delete();
                        System.out.println("File deleted: " + files[i]);
                    }
                }
            }
        }
        TimeUnit.SECONDS.sleep(sleepTime);
        return (directory.delete());
    }

    public static void compress(int sleepTime, String outputZipFile, String sourceFolder)
            throws IOException, InterruptedException {

        // finds folder of yesterdays date
        final Calendar cal = Calendar.getInstance();
        cal.add(Calendar.DATE, -1); // date of yesterday
        String timeStamp = new SimpleDateFormat("yyyyMMdd").format(cal.getTime()); // format the date
        System.out.println("Yesterday was " + timeStamp);

        if (sourceFolder.endsWith("/")) { // add yesterday folder to sourcefolder path
            sourceFolder = sourceFolder + timeStamp;
        } else {
            sourceFolder = sourceFolder + "/" + timeStamp;
        }

        if (outputZipFile.endsWith("/")) { // add yesterday folder name to outputZipFile path
            outputZipFile = outputZipFile + " " + timeStamp + ".7z";
        } else {
            outputZipFile = outputZipFile + "/" + timeStamp + ".7z";
        }

        File file = new File(sourceFolder);

        if (file.exists()) {
            try (SevenZOutputFile out = new SevenZOutputFile(new File(outputZipFile))) {
                addToArchiveCompression(out, file, ".");
                System.out.println("Files sucessfully compressed");

                deleteDirectory(new File(sourceFolder), sleepTime);
            }
        } else {
            System.out.println("Folder does not exist");
        }
    }

    private static void addToArchiveCompression(SevenZOutputFile out, File file, String dir) throws IOException {
        String name = dir + File.separator + file.getName();
        if (file.isFile()) {
            SevenZArchiveEntry entry = out.createArchiveEntry(file, name);
            out.putArchiveEntry(entry);

            FileInputStream in = new FileInputStream(file);
            byte[] b = new byte[1024];
            int count = 0;
            while ((count = in.read(b)) > 0) {
                out.write(b, 0, count);
            }
            out.closeArchiveEntry();
            in.close();
            System.out.println("File added: " + file.getName());
        } else if (file.isDirectory()) {
            File[] children = file.listFiles();
            if (children != null) {
                for (File child : children) {
                    addToArchiveCompression(out, child, name);
                }
            }
            System.out.println("Directory added: " + file.getName());
        } else {
            System.out.println(file.getName() + " is not supported");
        }
    }
}

I am using the Apache Commons Compress library

EDIT: Here is a link where I have some of the Apache Commons Compress code from.

Greenwich answered 7/1, 2019 at 14:3 Comment(7)
You have more than a 150-fold difference in file size. That could not plausibly result from using regular ZIP format instead of 7Z format. It's large enough that I think it unlikely to be attributable to using compressed entries in one case but not the other, though we don't have enough data to rule that out. The most likely issue here is that the (original) contents of the archives you are comparing differ.Anglesey
Yes, John Bollinger is right, I would compare java with 7z, is the unpacked image different in size (extra jpeg compression, resizing), is there an extra .thumbs file created?Boccioni
That may sound like a stupid question, but can you extract the 5kb archive correctly?Gonsalez
@Gonsalez yes I have tried it now and my original folder which is 873 kb big was extracted without any problems just like when I extract the one which was generated by my java program so both extract the same without any problemsGreenwich
Didn't work with 7-zip, but 873 kB, compressed to 737 for zip and to 5 kB for 7-zip seems a bit unreasonable. How many files are in that dir? In how many sub-dirs? What type of files are they?Melia
@Melia 7zip has a very good compression rate so this is usual for 7zip. In that dir are 28 xml files and 24 sub-dirs and each sub-dir has 48 xml filesGreenwich
7z performs better than zip but not that much. However it uses solid compression by default, which is a big saver. I know it's too late, but you can emulate solid compression in zip format using two-pass zip compression, see the edit in my answer (for posterity ;)).Violation
I
5

Use 7-Zip file archiver instead, it compresses 832 KB file to 26.0 KB easily:

  1. Get its Jar and SDK.
  2. Choose LZMA Compression .java related files.
  3. Add Run arguments to project properties: e "D:\\2017ASP.pdf" "D:\\2017ASP.7z", e stands for encode, "input path" "output path".
  4. Run the project [LzmaAlone.java].

Results

Case1 (.pdf file ): From 33,969 KB to 24,645 KB.

Case2 (.docx file ): From 832 KB to 26.0 KB.

Interment answered 15/1, 2019 at 11:24 Comment(1)
correct and this commons.apache.org/proper/commons-compress/apidocs/… can be also usedSlender
J
8

Commons Compress is starting a new block in the container file for each archive entry. Note the block counter here:

block-per-file

Not quite the answer you were hoping for, but the docs say it doesn't support "solid compression" - writing several files to a single block. See paragraph 5 in the docs here.

A quick look around found a few other Java libraries that support LZMA compression, but I couldn't spot one that could do so within the parent container file format for 7-Zip. Perhaps someone else knows of an alternative...

It sounds like a normal zip file format (e.g. via ZipOutputStream) is not an option?

Jacobson answered 9/1, 2019 at 22:52 Comment(4)
No, a normal zip file format would be too big sadlyGreenwich
A normal zip file cannot support solid compression because the format doesn't allow it.Lh
@Lh you can emulate solid compression by running two passes: first pass creates a zip with all files an no compression, second pass compresses that single zip file with max compression (see the last paragraph of [my answer)(https://mcmap.net/q/1262599/-java-7zip-compression-is-too-big)). That basically is tgz...Violation
@Violation Yes, that's a good observation, but you can do the tar + compression or the no_compression + compression solid emulation with almost any format, and the small 32KB "dictionary" for standard zip (anything else is not standard zip deflate anymore) means that tar.bz2 or tar.xz or 7z without compression + 7z with compression would have better resultsLh
V
5

I don't have enough rep to comment anymore so here are my thoughts:

  • I don't see where you set the compression ratio so it could be that SevenZOutputFile uses no (or very low) compression. As @CristiFati said, the difference in compression is odd, especially for text files
  • As noted by @df778899, there is no support for solid compression, which is how the best compression ratio is achieved, so you won't be able to do as well as the 7z command line

That said, if zip really isn't an option, your last resort could be to call the proper command line directly within your program.

If pure 7z is not mandatory, another option would be to use a "tgz"-like format to emulate solid compression: first compress all files to a non-compressed file (e.g. tar format, or zip file with no compression), then compress that single file in zip mode with standard Java Deflate algorithm. Of course that will be viable only if that format is recognized by further processes using it.

Violation answered 14/1, 2019 at 13:37 Comment(0)
I
5

Use 7-Zip file archiver instead, it compresses 832 KB file to 26.0 KB easily:

  1. Get its Jar and SDK.
  2. Choose LZMA Compression .java related files.
  3. Add Run arguments to project properties: e "D:\\2017ASP.pdf" "D:\\2017ASP.7z", e stands for encode, "input path" "output path".
  4. Run the project [LzmaAlone.java].

Results

Case1 (.pdf file ): From 33,969 KB to 24,645 KB.

Case2 (.docx file ): From 832 KB to 26.0 KB.

Interment answered 15/1, 2019 at 11:24 Comment(1)
correct and this commons.apache.org/proper/commons-compress/apidocs/… can be also usedSlender

© 2022 - 2024 — McMap. All rights reserved.