Get EXIF data without downloading whole image - Python
Asked Answered
S

2

6

Is is possible to get the EXIF information of an image remotely and with only downloading the EXIF data?

From what I can understand about EXIF bytes in image files, the EXIF data is in the first few bytes of an image.

So the question is how to download only the first few bytes of a remote file, with Python? (Edit: Relying on HTTP Range Header is not good enough, as not all remote hosts support it, in which case full download will occur.)

Can I cancel the download after x bytes of progress, for example?

Swain answered 13/12, 2012 at 14:8 Comment(0)
E
2

This depends on the image format heavily. For example, if you have a TIFF file, there is no knowing a priori where the EXIF data, if any, is within the file. It could be right after the header and before the first IFD, but this is unlikely. It could be way after the image data. Chances are it's somewhere in the middle.

If you want the EXIF information, extract that on the server (cache, maybe) and ship that down packaged up nicely instead of demanding client code do that.

Essene answered 13/12, 2012 at 15:14 Comment(4)
Your second paragraph assumes he owns the server storing the images, which might not necessarily be the case :) As for where the EXIF data is located, if I understand this answer correctly, in JPG files the EXIF data will be around the beginning of the file - do you know if this is correct?Prevaricator
Yes, I'm wondering the same myself. Most images are in .jpg, so that would be great.Swain
APP1 section appears after the APP0 section (if it exists). The APP0 marker can be followed by up to 64K of data by the spec, so you should be prepared to handle that. And there may be multiple APP1 sections. Were it me and I was hell-bent on doing this, I'd build a stream solution where I can cut off image delivery at any point (in this case after I have the EXIF, if any).Essene
@Essene How would you build such a stream solution? (ie. the downloading-and-cutting-off-image-delivery-part.)Swain
P
4

You can tell the web server to only send you parts of a file by setting the HTTP range header. See This answer for an example using urllib to partially download a file. So you could download a chunk of e.g. 1000 bytes, check if the exif data is contained in the chunk, and download more if you can't find the exif app1 header or the exif data is incomplete.

Prevaricator answered 13/12, 2012 at 15:9 Comment(1)
Thanks for that, but this is dependent on remote compliance with range header, which is not good enough. Need some way of cancelling curl after x bytes or similar, I'm thinking.Swain
E
2

This depends on the image format heavily. For example, if you have a TIFF file, there is no knowing a priori where the EXIF data, if any, is within the file. It could be right after the header and before the first IFD, but this is unlikely. It could be way after the image data. Chances are it's somewhere in the middle.

If you want the EXIF information, extract that on the server (cache, maybe) and ship that down packaged up nicely instead of demanding client code do that.

Essene answered 13/12, 2012 at 15:14 Comment(4)
Your second paragraph assumes he owns the server storing the images, which might not necessarily be the case :) As for where the EXIF data is located, if I understand this answer correctly, in JPG files the EXIF data will be around the beginning of the file - do you know if this is correct?Prevaricator
Yes, I'm wondering the same myself. Most images are in .jpg, so that would be great.Swain
APP1 section appears after the APP0 section (if it exists). The APP0 marker can be followed by up to 64K of data by the spec, so you should be prepared to handle that. And there may be multiple APP1 sections. Were it me and I was hell-bent on doing this, I'd build a stream solution where I can cut off image delivery at any point (in this case after I have the EXIF, if any).Essene
@Essene How would you build such a stream solution? (ie. the downloading-and-cutting-off-image-delivery-part.)Swain

© 2022 - 2024 — McMap. All rights reserved.