Java resumable hash computation
Asked Answered
O

3

9

I would like to achieve resumable on-the-fly hash generation of some file being uploaded on the server. The files are big so I am using the update(byte[]) method of MessageDigest class (as described here, for instance: How can I generate an MD5 hash? ) on the fly, as new bytes arrive from the HttpServletRequest's InputStream.

Everything is going well, however, it's becoming interesting at the moment when I want to add resumable upload support. If upload is prematurely terminated, the incomplete file is stored on the disk. However, the controller (and underlying service) exits, so the MessageDigest object is lost. Before that happens, can I serialize the MessageDigest object to the disk (or DB, it doesn't matter) in the way that when I deserialize the object again, it will remember its temporary state, so when I resume uploading (from the exact place where it has been terminated before, so no bytes are redundant, nor are some bytes missing) and continue update()ing that deserialized MessageDigest, ultimately I get the same result (hash) as if the file was uploaded whole at once?

Oliguria answered 1/8, 2012 at 10:51 Comment(1)
MessageDigest is not Serializable, so the immediate answer would be no. I'll allow others more familiar with Java server programming to comment on how best you could otherwise handle the problem.Johnathan
A
3

Grab one of the custom MD5 implementations like this one or this one. Make it serializable or just make its internal state public. Preserve the state when the upload is aborted, and restore it when the upload is resumed.

Antilogism answered 4/8, 2012 at 12:46 Comment(1)
Thanks for the answer, I find this one most usable.Oliguria
P
1

Hashes are cheap to compute (MD5 doubly so; are you sure you don't want SHA1?). I would recommend rehashing everything from the beginning as soon as you detect that an upload has been resumed. Runtime should be low unless the uploads are truly huge - hopefully large, interrupted uploads will be scarce.

Pursuance answered 2/8, 2012 at 19:46 Comment(1)
Thanks for the answer, however, rehashing is not very good option for me. I want this to be applied for uploading of large files (up to 1GB) where user's connection can possibly drop, if he is on wifi/etc. Disks will be processing many uploads and downloads simultaneously, so I don't want to add additional hash-computing overhead on them. Sure, I can use SHA1 (and probably will, that link was just an example), but even MD5 would be completely sufficient, because in this use case, I am not using hashing for security (password storage), but only for checksum generationOliguria
L
1

The BouncyCastle MD5 implementation allows creating a byte array of the internal state via MD5Digest.getEncodedState(). That data can be persistent and the object reconstructed from it later on using a constructor from byte array.

The source is at https://github.com/bcgit/bc-java/blob/b8e4716f170a63986f8d3144445e3abff0e40475/core/src/main/java/org/bouncycastle/crypto/digests/MD5Digest.java#L338.

Lyse answered 20/7 at 9:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.