I have one file created by 7zip program. I used deflate method to compress it. Now I want to create the same archive (with the same MD5sum) in java
. When I create zip file, I used the algorithm that I found on the Internet for example http://www.kodejava.org/examples/119.html but when I created zip file with this method the compressed size is higher than size of the uncompressed file so what is going on? This isn't a very useful compression. So how I can create zip file that is exactly same as zip file that I created with 7zip program ? If it helps I have all information about zip file that I created in 7zip program.
// simplified code for zip creation in java
import java.io.*;
import java.util.zip.*;
public class ZipCreateExample {
public static void main(String[] args) throws Exception {
// input file
FileInputStream in = new FileInputStream("F:/sometxt.txt");
// out put file
ZipOutputStream out = new ZipOutputStream(new FileOutputStream("F:/tmp.zip"));
// name the file inside the zip file
out.putNextEntry(new ZipEntry("zippedjava.txt"));
// buffer size
byte[] b = new byte[1024];
int count;
while ((count = in.read(b)) > 0) {
out.write(b, 0, count);
}
out.close();
in.close();
}
}
Just to clarify, you used the ZIP algorithm in 7zip for your original? Also 7zip claims to have a 2-10% better compression ratio than other vendors. I would venture a guess that the ZIP algorithm built into Java is not nearly as optimized as the one in 7zip. Your best best is to invoke 7zip from the command line if you want a similarly compressed file.
Are you trying to unpack a ZIP file, change a file within it, then re-compress it so that it has the same MD5 hash? Hashes are meant to prevent you from doing that.
ZipOutputStream has few methods to tune compression:
public void setMethod(int method)
Sets the default compression method for subsequent entries. This default will be used whenever the compression method is not specified for an individual ZIP file entry, and is initially set to DEFLATED.
public void setLevel(int level)
Sets the compression level for subsequent entries which are DEFLATED. The default setting is DEFAULT_COMPRESSION. level - the compression level (0-9)
When you add after something like:
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(target));
zos.setMethod( ZipOutputStream.DEFLATED );
zos.setLevel( 5 );
...
does not it improve your compression?
Here is a function that you pass the absolute path it will create a zip file with the same name as the directory (under which you want zip of all the sub folder and files, everything !!) and return true on success and false on exception if any.
public class FileUtil {
final static int BUFFER = 2048;
private static Logger log = Logger.getLogger(FileUtil.class);
public static boolean createZipArchive(String srcFolder) {
try {
BufferedInputStream origin = null;
FileOutputStream dest = new FileOutputStream(new File(srcFolder+ ".zip"));
ZipOutputStream out = new ZipOutputStream(new BufferedOutputStream(dest));
byte data[] = new byte[BUFFER];
File subDir = new File(srcFolder);
String subdirList[] = subDir.list();
for(String sd:subdirList)
{
// get a list of files from current directory
File f = new File(srcFolder+"/"+sd);
if(f.isDirectory())
{
String files[] = f.list();
for (int i = 0; i < files.length; i++) {
System.out.println("Adding: " + files[i]);
FileInputStream fi = new FileInputStream(srcFolder + "/"+sd+"/" + files[i]);
origin = new BufferedInputStream(fi, BUFFER);
ZipEntry entry = new ZipEntry(sd +"/"+files[i]);
out.putNextEntry(entry);
int count;
while ((count = origin.read(data, 0, BUFFER)) != -1) {
out.write(data, 0, count);
out.flush();
}
}
}
else //it is just a file
{
FileInputStream fi = new FileInputStream(f);
origin = new BufferedInputStream(fi, BUFFER);
ZipEntry entry = new ZipEntry(sd);
out.putNextEntry(entry);
int count;
while ((count = origin.read(data, 0, BUFFER)) != -1) {
out.write(data, 0, count);
out.flush();
}
}
}
origin.close();
out.flush();
out.close();
} catch (Exception e) {
log.info("createZipArchive threw exception: " + e.getMessage());
return false;
}
return true;
}
}
To generate two identical zip files (including identical md5sum) from the same source file, I would recommend using the same zip utility -- either always use the same Java program, or always use 7zip.
The 7zip utility for instance has a lot of options -- many of which are simply defaults that can be customized (or differ between releases?) -- and any Java zip implementation would have to also set these options explicitly. If your Java app can simply invoke an external "7z" program, you'll probably get better performance anyway that a custom Java zip implementation. (This is also a good example of a map-reduce problem where you can easily scale out the implementation.)
But the main issue you will run into if you have a server-side generated zip file and a client-side generated zip file is that the zip file stores two things in addition to just the original file: (1) the file name, and (2) the file timestamp. If either of these have changed, then the resulting zip file will have a different md5sum:
$ ls tst1/
foo.tar
$ cp -r tst1 tst2
$ ( cd tst1; zip foo.zip foo.tar ) ; ( cd tst2; zip foo.zip foo.tar ) ; md5sum tst?/foo.zip
updating: foo.tar (deflated 20%)
updating: foo.tar (deflated 20%)
359b82678a2e17c1ddbc795ceeae7b60 tst1/foo.zip
b55c33c0414ff987597d3ef9ad8d1d08 tst2/foo.zip
But, using "cp -p" (preserve timestamp):
$ cp -p -r tst1 tst2
$ ( cd tst1; zip foo.zip foo.tar ) ; ( cd tst2; zip foo.zip foo.tar ) ; md5sum tst?/foo.zip
updating: foo.tar (deflated 20%)
updating: foo.tar (deflated 20%)
359b82678a2e17c1ddbc795ceeae7b60 tst1/foo.zip
359b82678a2e17c1ddbc795ceeae7b60 tst2/foo.zip
You'll find the same problem with differing filenames and paths, even when the files inside the zip are identical.
package comm;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;*emphasized text*
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
public class Zip1 {
public static void main( String[] args )
{
byte[] buffer = new byte[1024];
try{
File f= new File("E:\\");
f.mkdirs();
File origFile= new File(f,"MyZipFile2.zip");
FileOutputStream fos = new FileOutputStream(origFile);
ZipOutputStream zos = new ZipOutputStream(fos);
ZipEntry ze= new ZipEntry("test.pdf");
zos.putNextEntry(ze);
FileInputStream in = new FileInputStream("D:\\Test.pdf");
int len;
while ((len = in.read(buffer)) > 0) {
zos.write(buffer, 0, len);
}
in.close();
zos.closeEntry();
//remember close it
zos.close();
System.out.println("Done");
}catch(IOException ex){
ex.printStackTrace();
}
}
}
Please find in the below code having the functionalities to zip and unzip. Hope it may help someone.
package com.util;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import java.util.zip.ZipOutputStream;
/**
* @author dinesh.lomte
*
*/
public class ZipUtil {
/**
*
* @param source
* @param destination
*/
public static void unZip(String source, String destination) {
String method = "unZip(String source, String destination)";
ZipInputStream zipInputStream = null;
try {
// Creating the ZipInputStream instance from the source file
zipInputStream = new ZipInputStream(new FileInputStream(source));
// Getting the zipped file list entry
ZipEntry zipEntry = zipInputStream.getNextEntry();
// Iterating through the file list entry
while (zipEntry != null) {
String fileName = zipEntry.getName();
File file = new File(new StringBuilder(destination)
.append(File.separator)
.append(AppUtil.getFileNameWithoutExtension(
AppUtil.getNameFromPath(source)))
.append(File.separator).append(fileName).toString());
// Creating non existing folders to avoid any FileNotFoundException
// for compressed folder
new File(file.getParent()).mkdirs();
FileOutputStream fileOutputStream = new FileOutputStream(file);
byte[] buffer = new byte[1024];
int length;
while ((length = zipInputStream.read(buffer)) > 0) {
fileOutputStream.write(buffer, 0, length);
}
fileOutputStream.close();
zipEntry = zipInputStream.getNextEntry();
}
} catch (IOException iOException) {
System.out.println("Failed to unzip the ''{0}'' file located in ''{1}'' folder. Due to, {2}");
} finally {
// Validating if zipInputStream instance in not null
if (zipInputStream != null) {
try {
zipInputStream.closeEntry();
zipInputStream.close();
} catch (IOException iOException) {
}
}
}
}
/**
* Traverse a directory from the source folder location and get all files,
* and add the file into files list.
*
* @param node
*/
public static void generateFileList(
String source, File node, List<String> files) {
// Validating if the node is a file
if (node.isFile()) {
files.add(generateZipEntry(
source, node.getPath().toString()));
}
// Validating if the node is a directory
if (node.isDirectory()) {
String[] subNote = node.list();
for (String filename : subNote) {
generateFileList(source, new File(node, filename), files);
}
}
}
/**
* Format the file path to zip
* @param source
* @param file
* @return
*/
private static String generateZipEntry(String source, String file) {
return file.substring(source.length(), file.length());
}
/**
*
* @param source
* @param destination
*/
public static void zip(String source, String destination) {
String method = "zip(String source, String destination)";
ZipOutputStream zipOutputStream = null;
try {
// Creating the zipOutputStream instance
zipOutputStream = new ZipOutputStream(
new FileOutputStream(destination));
List<String> files = new ArrayList<>();
generateFileList(source, new File(source), files);
// Iterating the list of file(s) to zip/compress
for (String file : files) {
// Adding the file(s) to the zip
ZipEntry zipEntry = new ZipEntry(file);
zipOutputStream.putNextEntry(zipEntry);
FileInputStream fileInputStream = new FileInputStream(
new StringBuilder(source).append(File.separator)
.append(file).toString());
int length;
byte[] buffer = new byte[1024];
while ((length = fileInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
// Closing the fileInputStream instance
fileInputStream.close();
// De-allocating the memory by assigning the null value
fileInputStream = null;
}
} catch (IOException iOException) {
System.out.println("Failed to zip the file(s) located in ''{0}'' folder. Due to, {1}");
} finally {
// Validating if zipOutputStream instance in not null
if (zipOutputStream != null) {
try {
zipOutputStream.closeEntry();
zipOutputStream.close();
} catch (IOException iOException) {
}
}
}
}
}
how I can create zip file that is exactly same as zip file that I created with 7zip program
This is an old question but I've been working on my SimpleZip Java package over the past month specifically to do what the OP is asking for -- to have full control over the Zip output. I wrote the library because I could not find a Zip replacement which gave me the fine grained control over the metadata in the file-headers or the central-directory entries. Specifically I was seeing problems with rewriting jars within jar files causing classpath loading issues.
My library has a ZipFileCopy.java example program that reads in a Zip and writes it out again as a series of objects without changing a byte. It:
- reads in
ZipFileHeader
- reads in file bytes
- reads in optional
ZipDataDescriptor
- repeat until
null
returned by ZipInfo.readFileHeader() - reads in
ZipCentralDirectoryFileEntry
until null - reads in
ZipCentralDirectoryEnd
when I created zip file with this method the compressed size is higher than size of the uncompressed file
With my library, the hardest part is going to be determining what compression-level was used and configuring the deflater algorithm being used to generate the same bytes. I'm just delegating to the JDK internal java.util.zip.Deflater
class and I'm not sure if the window sizes and the like match the 7zip program. Although there are Zip per-file flags that my library uses to determine the compression level of each file entry, they don't seem to always be assigned by Zip implementations. Without them my library would use the default level (I think 6).
Although the copy code has some comments, there is also some online documentation for SimpleZip.
© 2022 - 2024 — McMap. All rights reserved.