How to identify a zip file in java?
Asked Answered
B

6

11

I want to identify my archive whether it is zip or rar. But the problem I get runtime error before I can validate my file. I want to create custom notification:

public class ZipValidator {
  public void validate(Path pathToFile) throws IOException {
    try {
      ZipFile zipFile = new ZipFile(pathToFile.toFile());
      String zipname = zipFile.getName();
    } catch (InvalidZipException e) {
      throw new InvalidZipException("Not a zip file");
    }
  }
}

At the moment I have runtime error:

java.util.zip.ZipException: error in opening zip file

Binnie answered 26/11, 2015 at 8:54 Comment(3)
Get the file name. split the file name with . and get extension. Check whether extension is rar or zip .Pygmy
@Ravindrababu - and What is stoppning one from simply changing the extension of a non zip file to zip or rar?Avoirdupois
The filename (also the extension) can be changed without changing the file's content. Nothing stops you from naming zip file wrong.rar. A proper test is necessary.Cohleen
D
27

I'd suggest to open a plain InputStream an reading the first few bytes (magic bytes) and not to rely on the file extension as this can be easily spoofed. Also, you can omit the overhead creating and parsing the files.

For RAR the first bytes should be 52 61 72 21 1A 07.

For ZIP it should be one of:

  • 50 4B 03 04
  • 50 4B 05 06 (empty archive)
  • 50 4B 07 08 (spanned archive).

Source: https://en.wikipedia.org/wiki/List_of_file_signatures

Another point, just looked at your code:

Why do you catch die InvalidZipException, throw it away and construct a new one? This way you lose all the information from the original exception, making it hard to debug and understand what exactly went wrong. Either don't catch it at all or, if you have to wrap it, do it right:

} catch (InvalidZipException e) {
  throw new InvalidZipException("Not a zip file", e);
}
Depoliti answered 26/11, 2015 at 9:35 Comment(0)
C
21

Merging the answers of nanda & bratkartoffel.

private static boolean isArchive(File f) {
    int fileSignature = 0;
    try (RandomAccessFile raf = new RandomAccessFile(f, "r")) {
        fileSignature = raf.readInt();
    } catch (IOException e) {
        // handle if you like
    }
    return fileSignature == 0x504B0304 || fileSignature == 0x504B0506 || fileSignature == 0x504B0708;
}
Cursory answered 1/12, 2017 at 14:26 Comment(6)
Better use InputStream as it's handling a more general input, no?Footlights
This does not work for me. It evaluates a .docx file to trueEfren
@ThanasisNamikazee a docx file is, in fact, a zip file with a changed file extensionCursory
@FabianBraun its a fact I did not know about. So in reality this code works as expected.Efren
This code does not handle self-extracting (SFX) archives.Hotpress
@SergiuszWolicki well those would be executables so that's fair enoughSounder
T
4

Exception is thrown in line

ZipFile zipFile = new ZipFile(pathToFile.toFile());

That's because if a non-ZipFile is given as parameter for the ZipFileconstructor the ZipException is thrown. So you have to check before generating a new ZipFile Object if your file path points to a correct ZipFile. One solution might be to check the extension of the file path like so

 PathMatcher matcher = FileSystems.getDefault().getPathMatcher("glob:*.zip");
 boolean extensionCorrect = matcher.matches(path); 
Taught answered 26/11, 2015 at 9:28 Comment(2)
This does not work. For example I am using a .docx file and does not throw exceptionEfren
The name is just a name and .zip maybe taken as a hint to ZIP content but can be misleading.Cohleen
U
4
RandomAccessFile raf = new RandomAccessFile(f, "r");

long n = raf.readInt();

raf.close();

if (n == 0x504B0304)

    System.out.println("Should be a zip file");

else

    System.out.println("Not a zip file");

You can see it in the following link. http://www.coderanch.com/t/381509/java/java/check-file-zip-file-java

Upstart answered 26/11, 2015 at 9:30 Comment(1)
Partly correct, there can be 3 different signatures, not just one.Depoliti
T
0

Apache Tika was created to extract metadata from files, but one of its side benefits is it will determine a file's media-type by either its magic bytes and/or the files extension.

public String detect(InputStream stream, String name) throws IOException

Detects the media type of the given document. The type detection is based on the content of the given document stream and the name of the document.

Tantalizing answered 11/4, 2022 at 20:18 Comment(0)
D
0

I resolved it using Apache Tika (here for gzip):

final MimeTypes mimeTypes = MimeTypes.getDefaultMimeTypes(ClassLoader.getSystemClassLoader());
boolean isCompressed = mimeTypes.forName("application/gzip").matches(
        FileUtils.readFileToByteArray(FileUtils.getFile(filepath))
);

Remark: the library use the magic bytes, so it should be possible to read only some start bytes of the file instead of using FileUtils.getFile().

Deprecatory answered 26/8, 2023 at 19:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.