xPath finds nothing but *
Asked Answered
S

5

3

This is starting to piss me off real bad. I have this XML code:

Updated with correct namespaces

<?xml version="1.0" encoding="utf-8"?>

<Infringement xsi:schemaLocation="http://www.movielabs.com/ACNS http://www.movielabs.com/ACNS/ACNS2v1.xsd" xmlns="http://www.movielabs.com/ACNS" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Case>
    <ID>...</ID>
    <Status>Open</Status>
  </Case>
  <Complainant>
    <Entity>...</Entity>
    <Contact>...</Contact>
    <Address>...</Address>
    <Phone>...</Phone>
    <Email>...</Email>
  </Complainant>
  <Service_Provider>
    <Entity>...</Entity>
    <Address></Address>
    <Email>...</Email>
  </Service_Provider>
  <Source>
    <TimeStamp>...</TimeStamp>
    <IP_Address>...</IP_Address>
    <Port>...</Port>
    <DNS_Name></DNS_Name>
    <Type>...</Type>
    <UserName></UserName>
    <Number_Files>1</Number_Files>
    <Deja_Vu>No</Deja_Vu>
  </Source>
  <Content>
    <Item>
      <TimeStamp>...</TimeStamp>
      <Title>...</Title>
      <FileName>...</FileName>
      <FileSize>...</FileSize>
      <URL></URL>
    </Item>
  </Content>
</Infringement>

And this PHP code:

<?php 
    $data = urldecode($_POST["xml"]);
    $newXML = simplexml_load_string($data);

    var_dump($newXML->xpath("//ID"));
?>

I've dumped only $newXML and gotten tons of data, but the only xPath I've run that returned anything but an empty array was "*"

Isn't "//ID" supposed to find all ID nodes in the document? Why isn't it working?

Thanks

Slipover answered 15/9, 2010 at 16:13 Comment(8)
The problem is most probably the namespace (xmlns:xsi). Not sure how to help further though, insufficent skills :PIsocline
Not sure, but try replacing xmlns by ns. (inspired by this comment on PHP.net.Ratio
I probably sound like a fanatic but IMHO removing namespaces to make XPath queries work is just a XML breaking hack that is used to overcome either the defects of programming tools or the incompetence of the programmer. Namespaces are a fundamental concept in XML. Anyone who is going to use XML should learn to understand them.Favoritism
@Lekensteyn, Blindly messing with namespaces is like trying to deliver porcupine babies in the dark. The xmlns:xsi namespace declaration is required in order for the xsi:schemaLocation attribute to be parsed. If you change xmlns to ns, the XML document will no longer be well-formed. The comment you're referring to was replacing a default namespace declaration, xmlns=... (no colon or prefix), which determines the namespace of elements with no namespace prefix. @Codemonkey doesn't have a default namespace declaration, so that comment does not apply.Grigson
OK, the document has been edited so now it has a default namespace declaration. @Codemonkey, don't ask us to diagnose a document that misrepresents its content.Grigson
I had no idea the namespaces was an issueSlipover
@Codemonkey, I updated my answer to use the default namespace of your document. Please try if my sample code works, and comment if there is a problem.Favoritism
I have already tried, and it worked, but not all my XML files will have that default namespace - thus I needed a solution that ignored namespaces.Slipover
G
7

I've dumped only $newXML and gotten tons of data, but the only xPath I've run that returned anything but an empty array was "*"

So what was returned from var_dump($newXML->xpath("*"));? <Infringement>?

If the problem is namespaces, try this:

var_dump($newXML->xpath("//*[local-name() = 'ID']"));

This will match any element in the document whose name is 'ID', regardless of namespace.

My stuff works if i replace all "xmlns" with "ns"

Wait, what? Are you sure you showed us all the xmlns-related attributes in the document?

Update: The question was edited to show that the XML really does have a default namespace declaration. That explains the original problem: your XPath expression selects ID elements that are in no namespace, but the elements in your document are in the movielabs ACNS namespace, thanks to the default namespace declaration.

The declaration xmlns="http://www.movielabs.com/ACNS" on an element means "this element and all descendants that don't have a namespace prefix (like ID) are in the namespace represented by the namespace URI 'http://www.movielabs.com/ACNS'." (Unless an intervening descendant has a different default namespace declaration, which would shadow this one.)

So use my local-name() answer above to ignore namespaces, or use jasso's technique to specify the movielabs ACNS and use it as intended.

Grigson answered 15/9, 2010 at 16:53 Comment(2)
local-name() it is then. My script will get tons of XML documents in and I can't be sure they will all have the same default namespaceSlipover
@Codemonkey that's a fine solution. If you don't know about their default namespace but they're all in the same namespace (possibly using a namespace prefix), you could still use jasso's method, because the prefix in your script doesn't have to match the prefix in the XML document. Only the namespace URI has to match. Or you can ignore namespaces altogether.Grigson
F
9

Your XML document's root element seems to have default namespace with URI "http://www.movielabs.com/ACNS". This means that all elements in your document belong to that namespace. The problem is that all XPath expressions that do not have a namespace prefix are searching for elements that don't belong to any namespace. To search for elements (or attributes...) from a certain namespace you need to register the namespace URI to some prefix and then use this prefix in your XPath expression.

In case of PHP's simpleXML it's done something like this

$newXML = simplexml_load_string($data);
$newXML->registerXPathNamespace('prefix', 'http://www.movielabs.com/ACNS');
var_dump($newXML->xpath("//prefix:ID"));

prefixcan be practically any text, but the namespace URI must match exactly the one used in your XML document.

Favoritism answered 15/9, 2010 at 16:34 Comment(0)
G
7

I've dumped only $newXML and gotten tons of data, but the only xPath I've run that returned anything but an empty array was "*"

So what was returned from var_dump($newXML->xpath("*"));? <Infringement>?

If the problem is namespaces, try this:

var_dump($newXML->xpath("//*[local-name() = 'ID']"));

This will match any element in the document whose name is 'ID', regardless of namespace.

My stuff works if i replace all "xmlns" with "ns"

Wait, what? Are you sure you showed us all the xmlns-related attributes in the document?

Update: The question was edited to show that the XML really does have a default namespace declaration. That explains the original problem: your XPath expression selects ID elements that are in no namespace, but the elements in your document are in the movielabs ACNS namespace, thanks to the default namespace declaration.

The declaration xmlns="http://www.movielabs.com/ACNS" on an element means "this element and all descendants that don't have a namespace prefix (like ID) are in the namespace represented by the namespace URI 'http://www.movielabs.com/ACNS'." (Unless an intervening descendant has a different default namespace declaration, which would shadow this one.)

So use my local-name() answer above to ignore namespaces, or use jasso's technique to specify the movielabs ACNS and use it as intended.

Grigson answered 15/9, 2010 at 16:53 Comment(2)
local-name() it is then. My script will get tons of XML documents in and I can't be sure they will all have the same default namespaceSlipover
@Codemonkey that's a fine solution. If you don't know about their default namespace but they're all in the same namespace (possibly using a namespace prefix), you could still use jasso's method, because the prefix in your script doesn't have to match the prefix in the XML document. Only the namespace URI has to match. Or you can ignore namespaces altogether.Grigson
A
1

use this for any namespace:

var_dump($newXML->xpath("//*:ID"));
Alyosha answered 15/9, 2010 at 18:9 Comment(0)
C
0

I'm not well-versed in PHP's XML API, but I suspect the problem lies in the namespaces. Depending on how that xpath method works, it may be searching for ID elements with an empty namespace. Your ID elements inherit their namespace from the root element.

Counterscarp answered 15/9, 2010 at 16:18 Comment(4)
I don't even slightly understand - sorrySlipover
Did I misread it or did there used to be an xmlns attribute on the Infringement element?Counterscarp
There was, yes. Two of them in fact. My stuff works if i replace all "xmlns" with "ns", but is there no way around changing the XML?Slipover
@Codemonkey: There used to be two xmlns attributes? I see one right now, xmlns:xsi. Is the current question showing the actual attributes of the <Infringement> element? This is critical to diagnosing the problem.Grigson
M
0

You have an xml namespace defined in the document element (the xmlns="http://www.movielabs.com/ACNS" attribute). The namespace is the URL http://www.movielabs.com/ACNS. This has to by a globally unique string (an URN). Because of that URLs are used often. The chance that someone uses your domain for a namespace is very low and you can put some documentation at the URL.

The XML parser resolves the namespaces. The node gets 4 properties.

For <Infringement xmlns="http://www.movielabs.com/ACNS"/>:

$namespaceURI => http://www.movielabs.com/ACNS
$localName => Infringement
$prefix => 
$nodeName => Infringement

For <movie:Infringement xmlns:movie="http://www.movielabs.com/ACNS"/>:

$namespaceURI => http://www.movielabs.com/ACNS
$localName => Infringement
$prefix => movie
$nodeName => movie:Infringement

$namespaceURI and $localName are stable. The other two depend on prefix. The prefix is an alias for the namespace. The namespace uri is long and complex, it would make the XML a lot more difficult to read to write if used on each element/attribute. But you can interpret the element nodes like:

{http://www.movielabs.com/ACNS}:Infringement

So the namespace is the one thing that defines what the nodes mean, not the prefix/alias. Prefixes can be redefined on a sub element.

<foo xmlns="urn:foo"><bar xmlns="urn:bar"/></foo>

Xpath uses the same concept with an own resolver. You register your own prefixes for a namespace. So it doesn't matter how the prefixes are used in the XML, only the namespace uri has to match.

In DOM you do this on the DOMXPath instance:

$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('movie', 'http://www.movielabs.com/ACNS');

var_dump(
  $xpath->evaluate('string(/movie:Infringement/movie:Case/movie:ID)')
);

In SimpleXML, you can register the namespace on the SimpleXMLElement.

$element = simplexml_load_string($xml);
$element->registerXpathNamespace('movie', 'http://www.movielabs.com/ACNS');
var_dump(
  (string)$element->xpath('/movie:Infringement/movie:Case/movie:ID')[0]
);

HINT: The default namespace is only used for elements, attributes are in the "no/empty namespace" unless they have a prefix.

Messene answered 25/8, 2014 at 8:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.