simplexml and xpath, read sibling
Asked Answered
A

2

0

I have the following XML file :

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
    <item>
        [...]
        <wp:postmeta>
            <wp:meta_key>_wp_old_slug</wp:meta_key>
            <wp:meta_value><![CDATA[item-1-slug]]></wp:meta_value>
        </wp:postmeta>
        <wp:postmeta>
            <wp:meta_key>_yoast_wpseo_title</wp:meta_key>
            <wp:meta_value><![CDATA[item-1-title]]></wp:meta_value>
        </wp:postmeta>
        [...]
    </item>
    <item>
        [...]
        <wp:postmeta>
            <wp:meta_key>_wp_old_slug</wp:meta_key>
            <wp:meta_value><![CDATA[item-2-slug]]></wp:meta_value>
        </wp:postmeta>
        <wp:postmeta>
            <wp:meta_key>_yoast_wpseo_title</wp:meta_key>
            <wp:meta_value><![CDATA[item-2-title]]></wp:meta_value>
        </wp:postmeta>
        [...]
    </item>
</channel>
</rss>

I'm looping through my items with

$xmlurl = file_get_contents($xmlFile);
$xml = simplexml_load_string($xmlurl, null, LIBXML_NOCDATA);
$items = $xml->channel->item;
foreach( $items as $item ) {

}

Inside this loop, i'd like to read the value of the sibling of the <wp:meta_key>_yoast_wpseo_title</wp:meta_key> node. For example, for item 1, i'd like to get "item-1-title". I probably have to use xpath, but i really dont know how to proceed.

How can I do this ?

Actual answered 7/1, 2015 at 9:21 Comment(0)
H
3
$xpath = './/wp:meta_key[text()="_yoast_wpseo_title"]/following-sibling::wp:meta_value[1]/text()';
$items = $xml->channel->item;
foreach( $items as $item ) {
  $result = $item->xpath($xpath);
  print "$result[0]\n";
}

// => item-1-title
// => item-2-title

Explanation of the XPath expression:

.                               - from the current node...
//wp:meta_key                   - get all descendant wp:meta_key nodes
[text()="_yoast_wpseo_title"]   - whose text content is _yoast_wpseo_title
/following-sibling::            - then get the siblings that come after this
wp:meta_value[1]                - with tag wp:meta_value; only take the first
/text()                         - and read its text
Harass answered 7/1, 2015 at 9:45 Comment(9)
@Tomalak: Of course not, nor should it. I can't see the full XML though, so I just threw one on the root (rss). If it is not there, something will need to be changed. (I believe using $item->registerXPathNamespace).Harass
If it does not work, "of course", then you should not add it as a solution, in my opinion. Even if you "throw" a namespace declaration on the rss element in the input XML, it will not work (except if SimpleXML were namespace-unaware).Burnisher
@MathiasMüller: It is unreasonable to expect a complete solution from an incomplete question. Even Tomalak's answer might be wrong in case OP simply has an invalid XML (with an undeclared namespace), or happens to assign something else than WordPress namespace to the wp prefix. (It also does not keep the OP's loop, thus losing association between items and corresponding values.) And in my testing, I did throw a namespace declaration on the rss element, and it worked. Why wouldn't it?Harass
Because SimpleXML should not allow you to use path expressions with prefixes that you have not registered beforehand. (So, I'm talking about the PHP, not about the input XML) It should return an error saying that wp: is not associated with a namespace URI. I might be wrong about this - if so, please tell me where I am mistaken.Burnisher
@MathiasMüller: If it is declared on the root element, it will be recognised.Harass
So, technically speaking, will the SimpleXML PHP library redeclare namespace declarations that it finds on the outermost element of the input XML; and use the same prefix?Burnisher
@MathiasMüller: Actually, I oversimplified, it's not outermost: XPath query will take into account any declarations that are in scope on the element you are invoking the query on (from my experience). Thus, if the declaration is on each $item, or on their parent <channel>, it would still work. You only need to explicitly declare namespaces that would be declared somewhere between the node you query on and the node that you use the prefix with: print simplexml_load_string("<a><b xmlns:foo='bar'><c><foo:d>FOO:D</foo:d></c></b></a>", null, LIBXML_NOCDATA)->b->c->xpath('.//foo:d')[0]."\n";Harass
@MathiasMüller: But note that this does not work, since the meaning of the prefix is different between the node we run the query on and the node that we intend to find: print simplexml_load_string("<a><b xmlns:foo='bar'><c><foo:d xmlns:foo='notbar'>FOO:D</foo:d></c></b></a>", null, LIBXML_NOCDATA)->b->c->xpath('.//foo:d')[0]."\n";Harass
That's extraordinary, all XPath implementations I know would not allow that. Thanks, your explanations were really insightful for me.Burnisher
F
3

This solution includes reference to the Wordpress XML namespace:

$doc = new SimpleXmlElement($xml);
$doc->registerXPathNamespace ('wp', 'http://wordpress.org/export/1.0/');

$wp_meta_title = $doc->xpath("//wp:postmeta[wp:meta_key = '_yoast_wpseo_title']/wp:meta_value");

foreach ($wp_meta_title as $title) {
    echo (string)$title . "\n";
}

result:

item-1-title
item-2-title

See http://ideone.com/qjOfIW

The path //wp:postmeta[wp:meta_key = '_yoast_wpseo_title']/wp:meta_value is pretty straight-forward, I don't think it needs special explanation.

Frazil answered 7/1, 2015 at 10:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.