How is md5Hash calculated for com.google.appengine.api.blobstore.BlobInfo
Asked Answered
K

1

8

We're trying to avoid saving duplicate files. However, our md5 result is always different from BlobInfo.

How we calculate it:

    MessageDigest messageDigest = java.security.MessageDigest.getInstance("MD5");
    digest = messageDigest.digest(bytes);
    String digestString = DigestUtils.md5Hex(digest);

It doesn't match:with (new BlobInfoFactory().loadBlobInfo(blobKey)).getMd5Hash();

Example mismatches:

google vs my own calculation:
8cdeb6db94bc4fd156e2975fd8ebbcf2 vs 9003b37afbf3637de96c35774069453f 
65a25dafcba58d16d58a9c7585cc3932 vs 52383159f7d27417d50121aaee2728b5 
5cccc2d690fdc0c254234d5526876b34 vs 8196da9b6733daa60e08d927693df483 

It is on production server. ( we didn't test dev environment )

Kamchatka answered 23/11, 2012 at 23:49 Comment(7)
Are you trying this in the devappserver or in prod?Samarium
Can you also post what you are getting and what you expect. Also, what kind of file are you uploading?Samarium
Both questions answered. The files are usually pdf and images (jpeg).Kamchatka
Two more questions, can you try with an empty file. How are you saving the files?Samarium
If we do empty files, this is the result "d41d8cd98f00b204e9800998ecf8427e" vs "59adb24ef3cdbe0297f05b395827453f".Kamchatka
The way we created file is similar to javatechnologytutorials.blogspot.com/p/… createBlobFile(...). Except we do writeChannel.write(ByteBuffer.wrap(bytes)); instead of PrintWriter.Kamchatka
So, for the empty file GAE is computing the right MD5. It seems that there may be a problem with the way you are computing the md5hash.Samarium
K
7

Sebastian Kreft is right in the above discussion in comments.

The code I copied is wrong. it should just be:

String digestString = DigestUtils.md5Hex(bytes);

Sebastian Kreft used the trick to verify empty file's md5 which should always be d41d8cd98f00b204e9800998ecf8427e!

Kamchatka answered 26/12, 2012 at 20:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.