Read image XMP data in Python
Asked Answered
M

9

21

Can I use PIL, like in this example?

I only need to read the data, and I'm looking for the easiest simplest way to do it (I can't install pyexiv).

edit: I don't want to believe that the only way to do this is with some library (python-xmp-toolkit, pyexiv2, ...) that needs Exempi and Boost. There must be another option!

Mcnally answered 25/7, 2011 at 21:39 Comment(0)
S
15

Well, I was looking for something similar, then I came across the PHP equivalent question and I translated the answer to Python:

f = 'example.jpg'
fd = open(f)
d= fd.read()
xmp_start = d.find('<x:xmpmeta')
xmp_end = d.find('</x:xmpmeta')
xmp_str = d[xmp_start:xmp_end+12]
print(xmp_str)

you can then convert xmp_str and parse it with an XML API.

Starryeyed answered 14/11, 2011 at 10:20 Comment(3)
I like... always had problems with truncated keywords when using packages like PIL to access data. Another benefit is that reading it from the jpg results in no dependencies when writing a reusable package.Roughspoken
I had to open with 'rb' and find(b'<x:xmpmeta') and find('b</'x:xmpmeta'). Then it worked wonders at digging important metadata from DJI drone images.Conventional
XMP can now be in multiple separate pieces spread through the jpeg file, a condition this solution won't cope with.Helli
P
12

XMP metadata can be found in applist.

from PIL import Image
with Image.open(filename) as im:
    for segment, content in im.applist:
        marker, body = content.split('\x00', 1)
        if segment == 'APP1' and marker == 'http://ns.adobe.com/xap/1.0/':
            # parse the XML string with any method you like
            print body
Polychrome answered 14/8, 2015 at 3:24 Comment(3)
Nice, is that document anywhere? I only found github.com/python-pillow/Pillow/blob/…Putt
This won't always work as some jpeg files have the APP1 marker "XMP\0://ns.adobe.com/xap/1.0/" for some reason, and that \0 will break the split() function.Helli
If there are no further nulls in the body, you could do content.rsplit(b'\x00', 1) and b'http://ns.adobe.com/xap/1.0/' in marker instead.Retinue
K
3

I am also interested to know if there is a 'proper' easy way to do this.

In the mean time, I've implemented reading XMP packets using pure Python in PyAVM. The relevant code is here. Maybe this would be useful to you?

Kerril answered 26/7, 2011 at 2:13 Comment(0)
L
2
with open( imgFileName, "rb") as fin:
    img = fin.read()
    imgAsString=str(img)
    xmp_start = imgAsString.find('<x:xmpmeta')
    xmp_end = imgAsString.find('</x:xmpmeta')
    if xmp_start != xmp_end:
        xmpString = imgAsString[xmp_start:xmp_end+12]

    xmpAsXML = BeautifulSoup( xmpString )
    print(xmpAsXML.prettify())

Or you can use the Python XMP Toolkit

Litigate answered 31/1, 2013 at 23:50 Comment(1)
This will break when XMP is in multiple parts due to the jpeg format only allowing 64k for each chunk of such data.Helli
G
1

A search through the PIL source (1.1.7) tells me that it can recognize XMP information in Tiff files, but I cannot find any evidence of a documented or undocumented API for working with XMP information using PIL at the application level.

From the CHANGES file included in the source:

+ Support for preserving ICC profiles (by Florian Böch via Tim Hatch).

  Florian writes:

  It's a beta, so still needs some testing, but should allow you to:
  - retain embedded ICC profiles when saving from/to JPEG, PNG, TIFF.
     Existing code doesn't need to be changed.
  - access embedded profiles in JPEG, PNG, PSD, TIFF.

  It also includes patches for TIFF to retain IPTC, Photoshop and XMP
  metadata when saving as TIFF again, read/write TIFF resolution
  information correctly, and to correct inverted CMYK JPEG files.

So the support for XMP is limited to TIFF, and only allows XMP information to be retained when a TIFF image is loaded, possibly changed, and saved. The application cannot access or create XMP data.

Gutierrez answered 26/7, 2011 at 23:5 Comment(0)
K
1

Pillow (a PIL fork) can now return the xmpmetada in a dictionary invoking the method getxmp.

It works for png, jpeg and tif images since version 8.3.

Documentation can be found here.

Keynes answered 10/12, 2022 at 13:18 Comment(2)
This seemed to work for me, using code like: img = Image.open('/path/to/img.jpg') img.getxmp()['xmpmeta']['RDF']['Description']['title']['Alt']['li']['text']Candiot
Exactly... I'm doing the same!Keynes
I
0

Shout out to Chris Sherwood for the solution that I used. Came here to find a way to pull XMP data from DJI Drone Images. I too did not want to install Exempi. So, for posterity, I pulled these easier methods together for those people looking to extract values from XMP headers without a lot of hassle-

    # Extract XMP Data
    f = open(image_files[i], 'rb')
    d= f.read()
    xmp_start = d.find(b'<x:xmpmeta')
    xmp_end = d.find(b'</x:xmpmeta')
    xmp_str = d[xmp_start:xmp_end+12]

    # Extract Latitude
    search_str = b'Latitude="'
    value_start = xmp_str.find(search_str) + len(search_str)
    value_end = xmp_str.find(b'"',value_start)
    value = xmp_str[value_start:value_end]
    lat = value.decode('UTF-8')
Iodous answered 18/5, 2022 at 7:6 Comment(0)
S
0

Basing on answers from @dirac, @Rich, @user1911091 and a note from @hippietrail, I came up with this solution. Not quite elegant but gets the data in case it is scattered:

from bs4 import BeautifulSoup

f = open(self.filename, "rb")
d = f.read()
xmp_str = b""

while d:
    xmp_start = d.find(b"<x:xmpmeta")
    xmp_end = d.find(b"</x:xmpmeta")
    xmp_str += d[xmp_start : xmp_end + 12]
    d = d[xmp_end + 12 :]

xmpAsXML = BeautifulSoup(xmp_str)
print(xmpAsXML.prettify())
Sporophyll answered 5/12, 2022 at 2:1 Comment(0)
I
0

As of PIL 8.2.0, this can be achieved with the getxmp() Image method. It does require defusedxml to be installed though.

Illuminati answered 16/6, 2023 at 18:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.