How can I read XMP data from a JPG with PHP?
Asked Answered
F

10

16

PHP has built in support for reading EXIF and IPTC metadata, but I can't find any way to read XMP?

Firsthand answered 16/10, 2009 at 13:50 Comment(0)
F
25

XMP data is literally embedded into the image file so can extract it with PHP's string-functions from the image file itself.

The following demonstrates this procedure (I'm using SimpleXML but every other XML API or even simple and clever string parsing may give you equal results):

$content = file_get_contents($image);
$xmp_data_start = strpos($content, '<x:xmpmeta');
$xmp_data_end   = strpos($content, '</x:xmpmeta>');
$xmp_length     = $xmp_data_end - $xmp_data_start;
$xmp_data       = substr($content, $xmp_data_start, $xmp_length + 12);
$xmp            = simplexml_load_string($xmp_data);

Just two remarks:

  • XMP makes heavy use of XML namespaces, so you'll have to keep an eye on that when parsing the XMP data with some XML tools.
  • considering the possible size of image files, you'll perhaps not be able to use file_get_contents() as this function loads the whole image into memory. Using fopen() to open a file stream resource and checking chunks of data for the key-sequences <x:xmpmeta and </x:xmpmeta> will significantly reduce the memory footprint.
Forster answered 16/10, 2009 at 14:15 Comment(3)
That would explain why there is no XMP specific functions in PHP.Firsthand
This might not be reliable anymore. There can be multiple chunks of XMP in jpeg files these days.Narcoanalysis
@Narcoanalysis True. In these cases adjusting the logic to determine the start and end of a literal XMP package will be required. Such as (find <x:xmpmeta, find next </x:xmpmeta>) repeat until no more <x:xmpmeta can be foundForster
P
13

I'm only replying to this after so much time because this seems to be the best result when searching Google for how to parse XMP data. I've seen this nearly identical snippet used in code a few times and it's a terrible waste of memory. Here is an example of the fopen() method Stefan mentions after his example.

<?php

function getXmpData($filename, $chunkSize)
{
    if (!is_int($chunkSize)) {
        throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
    }

    if ($chunkSize < 12) {
        throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
    }

    if (($file_pointer = fopen($filename, 'r')) === FALSE) {
        throw new RuntimeException('Could not open file for reading');
    }

    $startTag = '<x:xmpmeta';
    $endTag = '</x:xmpmeta>';
    $buffer = NULL;
    $hasXmp = FALSE;

    while (($chunk = fread($file_pointer, $chunkSize)) !== FALSE) {

        if ($chunk === "") {
            break;
        }

        $buffer .= $chunk;
        $startPosition = strpos($buffer, $startTag);
        $endPosition = strpos($buffer, $endTag);

        if ($startPosition !== FALSE && $endPosition !== FALSE) {
            $buffer = substr($buffer, $startPosition, $endPosition - $startPosition + 12);
            $hasXmp = TRUE;
            break;
        } elseif ($startPosition !== FALSE) {
            $buffer = substr($buffer, $startPosition);
            $hasXmp = TRUE;
        } elseif (strlen($buffer) > (strlen($startTag) * 2)) {
            $buffer = substr($buffer, strlen($startTag));
        }
    }

    fclose($file_pointer);
    return ($hasXmp) ? $buffer : NULL;
}
Plainclothesman answered 14/5, 2010 at 20:13 Comment(6)
it is worth noting that this hangs when the image contains no XMP data, although I'm sure that this can be easily solved by one who knows how.Isologous
I added an else\break condition to the while loop which kills the loop if no XMP elements exist in the filePlainclothesman
I refactored this function to copy the chunk first and then perform detection/modification against the buffer rather than trying to do so against the chunks.Plainclothesman
Isn't it possible, that one chunk contains "<x:xmp" and the next one "meta...." causing the php script to miss the xmp fragment?Agronomy
You are absolutely correct. This function will not work properly with a chunkSize < 12. That's an easy fix but thanks for pointing it out!Plainclothesman
I have added a check to ensure that a chunkSize argument < 12 results in an exception.Plainclothesman
P
4

A simple way on linux is to call the exiv2 program, available in an eponymous package on debian.

$ exiv2 -e X extract image.jpg

will produce image.xmp containing embedded XMP which is now yours to parse.

Premillenarian answered 26/1, 2011 at 9:40 Comment(0)
S
3

I know... this is kind of an old thread, but it was helpful to me when I was looking for a way to do this, so I figured this might be helpful to someone else.

I took this basic solution and modified it so it handles the case where the tag is split between chunks. This allows the chunk size to be as large or small as you want.

<?php
function getXmpData($filename, $chunk_size = 1024)
{
	if (!is_int($chunkSize)) {
		throw new RuntimeException('Expected integer value for argument #2 (chunkSize)');
	}

	if ($chunkSize < 12) {
		throw new RuntimeException('Chunk size cannot be less than 12 argument #2 (chunkSize)');
	}

	if (($file_pointer = fopen($filename, 'rb')) === FALSE) {
		throw new RuntimeException('Could not open file for reading');
	}

	$tag = '<x:xmpmeta';
	$buffer = false;

	// find open tag
	while ($buffer === false && ($chunk = fread($file_pointer, $chunk_size)) !== false) {
		if(strlen($chunk) <= 10) {
			break;
		}
		if(($position = strpos($chunk, $tag)) === false) {
			// if open tag not found, back up just in case the open tag is on the split.
			fseek($file_pointer, -10, SEEK_CUR);
		} else {
			$buffer = substr($chunk, $position);
		}
	}

	if($buffer === false) {
		fclose($file_pointer);
		return false;
	}

	$tag = '</x:xmpmeta>';
	$offset = 0;
	while (($position = strpos($buffer, $tag, $offset)) === false && ($chunk = fread($file_pointer, $chunk_size)) !== FALSE && !empty($chunk)) {
		$offset = strlen($buffer) - 12; // subtract the tag size just in case it's split between chunks.
		$buffer .= $chunk;
	}

	fclose($file_pointer);

	if($position === false) {
		// this would mean the open tag was found, but the close tag was not.  Maybe file corruption?
		throw new RuntimeException('No close tag found.  Possibly corrupted file.');
	} else {
		$buffer = substr($buffer, 0, $position + 12);
	}

	return $buffer;
}
?>
Selfcontained answered 14/11, 2014 at 15:50 Comment(0)
S
2

Bryan's solution was the best one so far, but it had a few issues so I modified it to simplify it, and remove some functionality.

There were three issues I found with his solution:

A) If the chunk extracted falls right in between one of the strings we're searching for, it won't find it. Small chunk sizes are more likely to cause this issue.

B) If the chunk contains both the start AND the end, it won't find it. This is an easy one to fix with an extra if statement to recheck the chunk that the start is found in to see if the end is also found.

C) The else statement added to the end to break the while loop if it doesn't find the xmp data has a side effect that if the start element isn't found on the first pass, it will not check anymore chunks. This is likely easy to fix too, but with the first issue it's not worth it.

My solution below isn't as powerful, but it's more robust. It will only check one chunk, and extract the data from that. It will only work if the start and end are in that chunk, so the chunk size needs to be large enough to ensure that it always captures that data. From my experience with Adobe Photoshop/Lightroom exported files, the xmp data typically starts at around 20kB, and ends at around 45kB. My chunk size of 50k seems to work nicely for my images, it would be much less if you strip some of that data on export, such as the CRS block that has a lot of develop settings.

function getXmpData($filename)
{
    $chunk_size = 50000;
    $buffer = NULL;

    if (($file_pointer = fopen($filename, 'r')) === FALSE) {
        throw new RuntimeException('Could not open file for reading');
    }

    $chunk = fread($file_pointer, $chunk_size);
    if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
        $buffer = substr($chunk, $posStart);
        $posEnd = strpos($buffer, '</x:xmpmeta>');
        $buffer = substr($buffer, 0, $posEnd + 12);
    }
    fclose($file_pointer);
    return $buffer;
}
Scabby answered 15/5, 2012 at 16:21 Comment(4)
I updated my function with fixes for the logic issues it had :)Plainclothesman
Ahh, thanks Bryan! I never noticed you replied until now. I'll review your revised code and see if it works for me (I don't fully understand it yet, I'm not a programmer...)Scabby
Oooh, I get it now.. You're building the buffer one chunk at a time and always checking the buffer. This prevents all the problem I listed. Smart! Thanks.Scabby
I reviewed the code, and the last elseif statement, if I'm reading it correctly, is meant to wipe the buffer (save for the last bit, in case the start tag is hanging out there).. but from what I understand of the substr function... shouldn't it be $buffer = substr($buffer, -strlen($startTag)); (note the minus, to start from the end of the the string). As it is now, without the minus, the new $buffer value will be mostly the same as before, without being wiped. It will work, but not as efficiently as intended. Correct me if I'm wrong (and sorry for the million comments)Scabby
D
2

Thank you Sebastien B. for that shortened version :). If you want to avoid the problem, when chunk_size is just too small for some files, just add recursion.

function getXmpData($filename, $chunk_size = 50000){      
  $buffer = NULL;
  if (($file_pointer = fopen($filename, 'r')) === FALSE) {
    throw new RuntimeException('Could not open file for reading');
  }

  $chunk = fread($file_pointer, $chunk_size);
  if (($posStart = strpos($chunk, '<x:xmpmeta')) !== FALSE) {
      $buffer = substr($chunk, $posStart);
      $posEnd = strpos($buffer, '</x:xmpmeta>');
      $buffer = substr($buffer, 0, $posEnd + 12);
  }

  fclose($file_pointer);

// recursion here
  if(!strpos($buffer, '</x:xmpmeta>')){
    $buffer = getXmpData($filename, $chunk_size*2);
  }

  return $buffer;
}
Dissidence answered 28/1, 2014 at 8:51 Comment(0)
F
2

If you have ExifTool available (a very useful tool) and can run external commands, you can use it's option to extract XMP data (-xmp:all) and output it in JSON format (-json), which you can then easily convert to a PHP object:

$command = 'exiftool -g -json -struct -xmp:all "'.$image_path.'"';
exec($command, $output, $return_var);
$metadata = implode('', $output);
$metadata = json_decode($metadata);
Firsthand answered 17/11, 2014 at 16:9 Comment(0)
K
1

I've developped the Xmp Php Tookit extension : it's a php5 extension based on the adobe xmp toolkit, which provide the main classes and method to read/write/parse xmp metadatas from jpeg, psd, pdf, video, audio... This extension is under gpl licence. A new release will be available soon, for php 5.3 (now only compatible with php 5.2.x), and should be available on windows and macosx (now only for freebsd and linux systems). http://xmpphptoolkit.sourceforge.net/

Knob answered 8/6, 2010 at 16:3 Comment(1)
I tried your toolkit, but I couldn't get it to compile :( Complaining about missing printf. "xmp_toolkit/common/XMP_LibUtils.hpp:179:62: error: ‘printf’ was not declared in this scope"Economist
D
0

There is now also a github repo you can add via composer that can read xmp data:

https://github.com/jeroendesloovere/xmp-metadata-extractor

composer require jeroendesloovere/xmp-metadata-extractor

Deafening answered 23/8, 2021 at 10:37 Comment(0)
B
0

If you are able to install exiv2 in your environment:

sudo apt install exiv2

then, building on fluxine's answer, it is possible to use exiv2 to extract all the image meta data (EXIF, IPTC and XMP) into an associative array:

function image_meta_data($image_path) {
    $meta_data = [];

    // execute exiv2 via the command line
    exec('exiv2 -Pkt ' . $image_path, $output = null, $retval = null);

    // process output into associative array
    foreach ($output as $line) {
        $key = trim(substr($line, 0, 46));
        $value = str_replace('lang="x-default" ', '', trim(substr($line, 46))); // remove in-line language tag
        $meta_data[$key] = $value;
    }

    return $meta_data;
}

Usage:

$meta = image_meta_data($image_path);
print_r($meta);
// Examples:
echo $meta['Xmp.dc.title'] ?? '';
echo $meta['Iptc.Application2.DateCreated'] ?? '';
echo $meta['Exif.Image.ImageDescription'] ?? '';
Budgerigar answered 22/3, 2023 at 11:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.