Read image IPTC data
Asked Answered
B

3

8

I'm having some trouble with reading out the IPTC data of some images, the reason why I want to do this, is because my client has all the keywords already in the IPTC data and doesn't want to re-enter them on the site.

So I created this simple script to read them out:

$size = getimagesize($image, $info);

if(isset($info['APP13'])) {
    $iptc = iptcparse($info['APP13']);

    print '<pre>';
        var_dump($iptc['2#025']);
    print '</pre>';
}

This works perfectly in most cases, but it's having trouble with some images.

Notice: Undefined index: 2#025

While I can clearly see the keywords in photoshop.

Are there any decent small libraries that could read the keywords in every image? Or am I doing something wrong here?

Bullate answered 8/1, 2012 at 14:55 Comment(1)
I did notice this only happens to images saved in Photoshop CS3Bullate
E
1

I've seen a lot of weird IPTC problems. Could be that you have 2 APP13 segments. I noticed that, for some reasons, some JPEGs have multiple IPTC blocks. It's possibly the problem with using several photo-editing programs or some manual file manipulation.

Could be that PHP is trying to read the empty APP13 or even embedded "thumbnail metadata".

Could be also problem with segments lenght - APP13 or 8BIM have lenght marker bytes that might have wrong values.

Try HEX editor and check the file "manually".

Excursive answered 20/4, 2012 at 8:5 Comment(0)
G
1

I have found that IPTC is almost always embedded as xml using the XMP format, and is often not in the APP13 slot. You can sometimes get the IPTC info by using iptcparse($info['APP1']), but the most reliable way to get it without a third party library is to simply search through the image file from the relevant xml string (I got this from another answer, but I haven't been able to find it, otherwise I would link!):

The xml for the keywords always has the form "<dc:subject>...<rdf:Seq><rdf:li>Keyword 1</rdf:li><rdf:li>Keyword 2</rdf:li>...<rdf:li>Keyword N</rdf:li></rdf:Seq>...</dc:subject>"

So you can just get the file as a string using file_get_contents(get_attached_file($attachment_id)), use strpos() to find each opening (<rdf:li>) and closing (</rdf:li>) XML tag, and grab the keyword between them using substr().

The following snippet works for all jpegs I have tested it on. It will fill the array $keys with IPTC tags taken from an image on wordpress with id $attachment_id:

$content = file_get_contents(get_attached_file($attachment_id));

// Look for xmp data: xml tag "dc:subject" is where keywords are stored
$xmp_data_start = strpos($content, '<dc:subject>') + 12;

// Only proceed if able to find dc:subject tag
if ($xmp_data_start != FALSE) {
    $xmp_data_end   = strpos($content, '</dc:subject>');
    $xmp_data_length     = $xmp_data_end - $xmp_data_start;
    $xmp_data       = substr($content, $xmp_data_start, $xmp_data_length);

    // Look for tag "rdf:Seq" where individual keywords are listed
    $key_data_start = strpos($xmp_data, '<rdf:Seq>') + 9;

    // Only proceed if able to find rdf:Seq tag
    if ($key_data_start != FALSE) {
        $key_data_end   = strpos($xmp_data, '</rdf:Seq>');
        $key_data_length     = $key_data_end - $key_data_start;
        $key_data       = substr($xmp_data, $key_data_start, $key_data_length);

        // $ctr will track position of each <rdf:li> tag, starting with first
        $ctr = strpos($key_data, '<rdf:li>');

        // Initialize empty array to store keywords
        $keys = Array();

        // While loop stores each keyword and searches for next xml keyword tag
        while($ctr != FALSE && $ctr < $key_data_length) {
            // Skip past the tag to get the keyword itself
            $key_begin = $ctr + 8;

            // Keyword ends where closing tag begins
            $key_end = strpos($key_data, '</rdf:li>', $key_begin);

            // Make sure keyword has a closing tag
            if ($key_end == FALSE) break;

            // Make sure keyword is not too long (not sure what WP can handle)
            $key_length = $key_end - $key_begin;
            $key_length = (100 < $key_length ? 100 : $key_length);

            // Add keyword to keyword array
            array_push($keys, substr($key_data, $key_begin, $key_length));

            // Find next keyword open tag
            $ctr = strpos($key_data, '<rdf:li>', $key_end);
        }
    }
} 

I have this implemented in a plugin to put IPTC keywords into WP's "Description" field, which you can find here.

Gardol answered 25/7, 2019 at 20:44 Comment(0)
F
0

ExifTool is very robust if you can shell out to that (from PHP it looks like?)

Finfoot answered 19/1, 2012 at 3:0 Comment(1)
Thanks, but unfortunately I cannot run shell commands on the current hosting.Bullate

© 2022 - 2024 — McMap. All rights reserved.