Detect if file is password protected without loading it into memory?
Asked Answered
Y

1

6

There are some existing posts out there that talk about "how to detect if a document is password protected".

This is probably the most comprehensive of these links for MS Office docs: Detecting a password-protected document (The code is written in C#).

I am in a Java application and I want to be able to detect if a PDF, XLS, XLSX, DOC, DOCX or ZIP file is password protected or not.

So I immediately reached for Apache Tika.

I cannot seem to find a way to detect if a document is password protected while guaranteeing that it does not parse the entire document and does not at any point load the entire document into memory.

What I'm thinking is I set up a content handler (I have an example here: https://github.com/nddipiazza/tika-fork/blob/master/tika-fork-main/src/main/java/org/apache/tika/fork/main/TikaBodyContentHandler.java) where i stop parsing after 64K or something like that.

Is there an easier way?

Yuzik answered 18/9, 2019 at 17:21 Comment(2)
Hey , Did you solve this ? If yes , it would be of great help if you can tell which approach you took finallyArtilleryman
i had to think back as this was some time ago. i remembered how I solved this and put my answerYuzik
Y
0

Solution: Used tika api to parse the document with writeLimit=1000 chars are something small to get a small sample of the content. In this way you get a "sample" of the content assuring you that the file is not encrypted but you at the same time did not scan the entire file.

Depending on the Tika parser that was used, typically won't load the entire thing into memory by doing this, as Tika operates using streams, not loading entire bytes into memory.

Yuzik answered 2/12, 2022 at 12:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.