Checksum JPEG data (not the whole file)
Asked Answered
C

6

6

Are there end-of-exif / end-of-xmp / end-of-iptc / start-of-data markers that I could use to get a checksum of just the data part of a jpg / jpeg (and other image formats)?

Cruelty answered 28/12, 2009 at 21:4 Comment(0)
C
0

MediaTags has checksum support for JPEG, MP3, M4A, etc

Cruelty answered 30/1, 2011 at 1:11 Comment(1)
Project summary still leaves open what is supported to which detail - the main purpose seems to be extracting embedded pictures...Complacent
C
2

I think this question is related to this one Compute hash of only the core image data (excluding metadata) for an image, https://mcmap.net/q/492916/-compute-hash-of-only-the-core-image-data-excluding-metadata-for-an-image gives an element of answer if you're looking for code.

It might not works with all JPG variants though : some of them can embed multiple images (MPF / CIPA Multi-Picture Format, more informations at http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/MPF.html) and you might still have some metadata. Also, some software put an UID in the form of --[0-9A-F]+-- at the end of the file and it shouldn't be read. Safest solution if probably to checksum pixels (though you can still have influence of orientation, color profile, ..).

Cayes answered 25/10, 2016 at 5:5 Comment(0)
N
0

One easy way to get a hash sum of just the pixel data would be to convert the JPEG into a 32Bit BMP or alternatively into PNG and to calculate a hashsum from that. This will strip all the associated information from the JPEGs and would even match JPEGs with differnt encodings that lead to the same pixel data. You could of course also use the in memory pixel data of the resulting BMPs directly if you have it (i.e. Windows has several API functions to get it from any supported image type).

Noncontributory answered 28/12, 2009 at 21:18 Comment(1)
A decoded JPEG can vary depending on the rounding used in the decoder. You generally won't be able to see the difference, but it would change the checksum.Swig
D
0

You'll have to look at each format. For JPEG, it looks like the structure implies that you can just do a checksum of the sections that start with FFEn (e.g. 0xFFE1) and checksum the bytes specified after each marker (It looks like the length follows the marker and is 2 bytes in big-endian format). For more details, see here.

Dambrosio answered 28/12, 2009 at 21:29 Comment(2)
From what I can tell the 0xFFE? markers ARE the metadata. What did you read that makes you think that?Cruelty
It seems like it's the boundary of the metadata (e.g. start at FFE1 to get the length, then that amount of length is the EXIF data). See media.mit.edu/pia/Research/deepview/exif.html#ExifMarkerDambrosio
M
0

Yes to jpeg and exif, I don't know to the others.

The JPEG spec that I have is called JFIF (JPEG File Interchange Format) it comes from Annex B of ISO 10918-1 and like all ISO specs, it takes careful reading to figure out how to translate the spec into data structures. I think this is much easier to follow

the EXIF format parses much like the TIFF format. each chunk has a type and a size, so you just walk the chunks until you get to the image data chunk. it has a pointer to the image data (actually pointers to strips, but I'm pretty sure that you can assume the everything after the first strip of image data to the end of the file is image data.

The exif format has its own website

Mahdi answered 28/12, 2009 at 21:35 Comment(0)
T
0

Since you want to do this for various image formats, you should just use a general-purpose image decompression library and run your checksum on the uncompressed data. This will allow you to match identical images even if they are encoded differently on disk.

If you want to limit yourself to JPEG, you can checksum the data between SOI and EOI. This answer can be slightly adapted to do what you need.

Toomin answered 30/12, 2009 at 17:54 Comment(0)
C
0

MediaTags has checksum support for JPEG, MP3, M4A, etc

Cruelty answered 30/1, 2011 at 1:11 Comment(1)
Project summary still leaves open what is supported to which detail - the main purpose seems to be extracting embedded pictures...Complacent

© 2022 - 2024 — McMap. All rights reserved.