How to query XML using namespaces in Java with XPath?
Asked Answered
E

8

68

When my XML looks like this (no xmlns) then I can easly query it with XPath like /workbook/sheets/sheet[1]

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook>
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
  </sheets>
</workbook>

But when it looks like this then I can't

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <sheets>
    <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
  </sheets>
</workbook>

Any ideas?

Efficient answered 17/6, 2011 at 18:45 Comment(2)
how are you accessing it in the second example?Seychelles
Please post the Java source you have so farJoule
S
72

In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default "no namespace" namespace, so they don't match.

The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.

However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.

You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name() and the namespace-uri(). For example:

/*[local-name()='workbook'
    and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
  /*[local-name()='sheets'
      and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
  /*[local-name()='sheet'
      and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]

As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).

You could also just match on the local-name() of the element and ignore the namespace. For example:

/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]

However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name(), your XPath could match on the wrong elements and select the wrong content:

Shortie answered 18/6, 2011 at 16:34 Comment(3)
I don't get why I need to associate the namespace URI and the namespace prefix in my XPath, anyway? In the XML document, there is already such an association, like xmlns:r="schemas.openxmlformats.org/officeDocument/2006/relationships" in the original question. There, the prefix r is bound to the namespace URI. The way I read it, I'd be forced to re-establish this connection in my XPath (or programmatically).Vetchling
I would advice against this practice. If at all possible, do not match by local name and namespace, it will clutter your code and the fast hash-speed lookup will not work. @nokul: that's because an XPath can operate on any document and the namespace prefix can be different, but the namespace not. If you bind xmlns:xx to namespace aaa, and the document has <yy:foo> in the same namespace, the xpath expression xx:foo will select that node.Arlenearles
The following xpath did not work in our case: /NotifyShipment/DataArea/Shipment/ShipmentHeader/Status/Code/text() and this xpath appears to be helping based on above answer: (/*[local-name()='NotifyShipment']/*[local-name()='DataArea']/*[local-name()='Shipment']/*[local-name()='ShipmentHeader']/*[local-name()='Status']/*[local-name()='Code']/text()). we might come out another approach, but thank you for a very good note!Ober
U
66

Your problem is the default namespace. Check out this article for how to deal with namespaces in your XPath: http://www.edankert.com/defaultnamespaces.html

One of the conclusions they draw is:

So, to be able to use XPath expressions on XML content defined in a (default) namespace, we need to specify a namespace prefix mapping

Note that this doesn't mean that you have to change your source document in any way (though you're free to put the namespace prefixes in there if you so desire). Sounds strange, right? What you will do is create a namespace prefix mapping in your java code and use said prefix in your XPath expression. Here, we'll create a mapping from spreadsheet to your default namespace.

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();

// there's no default implementation for NamespaceContext...seems kind of silly, no?
xpath.setNamespaceContext(new NamespaceContext() {
    public String getNamespaceURI(String prefix) {
        if (prefix == null) throw new NullPointerException("Null prefix");
        else if ("spreadsheet".equals(prefix)) return "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
        else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
        return XMLConstants.NULL_NS_URI;
    }

    // This method isn't necessary for XPath processing.
    public String getPrefix(String uri) {
        throw new UnsupportedOperationException();
    }

    // This method isn't necessary for XPath processing either.
    public Iterator getPrefixes(String uri) {
        throw new UnsupportedOperationException();
    }
});

// note that all the elements in the expression are prefixed with our namespace mapping!
XPathExpression expr = xpath.compile("/spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1]");

// assuming you've got your XML document in a variable named doc...
Node result = (Node) expr.evaluate(doc, XPathConstants.NODE);

And voila...Now you've got your element saved in the result variable.

Caveat: if you're parsing your XML as a DOM with the standard JAXP classes, be sure to call setNamespaceAware(true) on your DocumentBuilderFactory. Otherwise, this code won't work!

Uphemia answered 17/6, 2011 at 18:58 Comment(6)
How to do it with just Java SDK? I don't have SimpleNamespaceContext and don't want to use external libs.Efficient
@lnez check it out...i updated my answer to show how you can do it with standard jdk classes.Uphemia
+1 for setNamespaceAware(true) ..xpath was driving me crazy before I found that issue is not in registering NS or xpath statement itself but rather much earlier on!Araucania
re: "if you're parsing your XML as a DOM with the standard JAXP classes, be sure to call setNamespaceAware(true) on your DocumentBuilderFactory." OMG Java is sooo dumb. 2 hours on this.Sadiras
If you have a default namespace (xmlns="http://www.default.com/..." as well as prefixed ones xmlns:foo="http://www.foo.com/...") then you also need to provide a mapping for default in order for your XPath expressions to be able to target the elements using the default namespace (eg they don't have a prefix). For the example above simply add another condition to getNamespaceURI eg else if ("default".equals(prefix)) return "http://www.default.com/...";. Took me a bit to figure this out, hopefully can save someone else some engineering hours.Delibes
@markdsievers: But the answer does exactly that (using "spreadsheet" as the prefix for the default namespace).Interpleader
J
39

All namespaces that you intend to select from in the source XML must be associated with a prefix in the host language. In Java/JAXP this is done by specifying the URI for each namespace prefix using an instance of javax.xml.namespace.NamespaceContext. Unfortunately, there is no implementation of NamespaceContext provided in the SDK.

Fortunately, it's very easy to write your own:

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.namespace.NamespaceContext;

public class SimpleNamespaceContext implements NamespaceContext {

    private final Map<String, String> PREF_MAP = new HashMap<String, String>();

    public SimpleNamespaceContext(final Map<String, String> prefMap) {
        PREF_MAP.putAll(prefMap);       
    }

    public String getNamespaceURI(String prefix) {
        return PREF_MAP.get(prefix);
    }

    public String getPrefix(String uri) {
        throw new UnsupportedOperationException();
    }

    public Iterator getPrefixes(String uri) {
        throw new UnsupportedOperationException();
    }

}

Use it like this:

XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
HashMap<String, String> prefMap = new HashMap<String, String>() {{
    put("main", "http://schemas.openxmlformats.org/spreadsheetml/2006/main");
    put("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
}};
SimpleNamespaceContext namespaces = new SimpleNamespaceContext(prefMap);
xpath.setNamespaceContext(namespaces);
XPathExpression expr = xpath
        .compile("/main:workbook/main:sheets/main:sheet[1]");
Object result = expr.evaluate(doc, XPathConstants.NODESET);

Note that even though the first namespace does not specify a prefix in the source document (i.e. it is the default namespace) you must associate it with a prefix anyway. Your expression should then reference nodes in that namespace using the prefix you've chosen, like this:

/main:workbook/main:sheets/main:sheet[1]

The prefix names you choose to associate with each namespace are arbitrary; they do not need to match what appears in the source XML. This mapping is just a way to tell the XPath engine that a given prefix name in an expression correlates with a specific namespace in the source document.

Joule answered 17/6, 2011 at 23:11 Comment(8)
I found another way to use the namespaces, but you gave me the hint - so thank you.Goldagoldarina
@Goldagoldarina Can you post your "another way"?Cony
Apologies @Stephan, I can't remember exactly what I did there, but this put me on the right track.Goldagoldarina
+1 for neat NamespaceContext implementation. You should stress that setNamespaceAware(true) is set on DocumentBuilderFactory as @Uphemia did. Otherwise, this code won't work! It is not that easy to figure out. Basically if one have xml with namespaces and don't make DBF NS aware then xpath is silently turned useless and only searching using local-name() works.Araucania
If you have a default namespace (xmlns="http://www.default.com/..." as well as prefixed ones xmlns:foo="http://www.foo.com/...") then you also need to provide a mapping for default in order for your XPath expressions to be able to target the elements using the default namespace (eg they don't have a prefix). For the example above simply add another condition to getNamespaceURI eg else if ("default".equals(prefix)) return "http://www.default.com/...";. Took me a bit to figure this out, hopefully can save someone else some engineering hours.Delibes
Excellent answer. I would like to add that you should make sure your XML document has first been opened using setNamespaceAware(true);Wilkerson
@markdsievers: But the answer does exactly that (using "spreadsheet" as the prefix for the default namespace).Interpleader
Yeah, I called that out explicitly :)Joule
R
7

If you are using Spring, it already contains org.springframework.util.xml.SimpleNamespaceContext.

        import org.springframework.util.xml.SimpleNamespaceContext;
        ...

        XPathFactory xPathfactory = XPathFactory.newInstance();
        XPath xpath = xPathfactory.newXPath();
        SimpleNamespaceContext nsc = new SimpleNamespaceContext();

        nsc.bindNamespaceUri("a", "http://some.namespace.com/nsContext");
        xpath.setNamespaceContext(nsc);

        XPathExpression xpathExpr = xpath.compile("//a:first/a:second");

        String result = (String) xpathExpr.evaluate(object, XPathConstants.STRING);
Rapper answered 30/1, 2018 at 13:45 Comment(0)
T
1

I've written a simple NamespaceContext implementation (here), that takes a Map<String, String> as input, where the key is a prefix, and the value is a namespace.

It follows the NamespaceContext spesification, and you can see how it works in the unit tests.

Map<String, String> mappings = new HashMap<>();
mappings.put("foo", "http://foo");
mappings.put("foo2", "http://foo");
mappings.put("bar", "http://bar");

context = new SimpleNamespaceContext(mappings);

context.getNamespaceURI("foo");    // "http://foo"
context.getPrefix("http://foo");   // "foo" or "foo2"
context.getPrefixes("http://foo"); // ["foo", "foo2"]

Note that it has a dependency on Google Guava

Twilley answered 28/9, 2015 at 10:43 Comment(0)
I
1

Two things to add to the existing answers:

  • I don't know whether this was the case when you asked the question: With Java 10, your XPath actually works for the second document if you don't use setNamespaceAware(true) on the document builder factory (falseis the default).

  • If you do want to use setNamespaceAware(true), other answers have already shown how to do this using a namespace context. However, you don't need to provide the mapping of prefixes to namespaces yourself, as these answers do: It's already there in the document element, and you can use that for your namespace context:

import java.util.Iterator;

import javax.xml.namespace.NamespaceContext;

import org.w3c.dom.Document;
import org.w3c.dom.Element;

public class DocumentNamespaceContext implements NamespaceContext {
    Element documentElement;

    public DocumentNamespaceContext (Document document) {
        documentElement = document.getDocumentElement();
    }

    public String getNamespaceURI(String prefix) {
        return documentElement.getAttribute(prefix.isEmpty() ? "xmlns" : "xmlns:" + prefix);
    }

    public String getPrefix(String namespaceURI) {
        throw new UnsupportedOperationException();
    }

    public Iterator<String> getPrefixes(String namespaceURI) {
        throw new UnsupportedOperationException();
    }
}

The rest of the code is as in the other answers. Then the XPath /:workbook/:sheets/:sheet[1] yields the sheet element. (You could also use a non-empty prefix for the default namespace, as the other answers do, by replacing prefix.isEmpty() by e.g. prefix.equals("spreadsheet") and using the XPath /spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1].)

P.S.: I just found here that there's actually a method Node.lookupNamespaceURI(String prefix), so you could use that instead of the attribute lookup:

    public String getNamespaceURI(String prefix) {
        return documentElement.lookupNamespaceURI(prefix.isEmpty() ? null : prefix);
    }

Also, note that namespaces can be declared on elements other than the document element, and those wouldn't be recognized (by either version).

Interpleader answered 20/7, 2019 at 12:42 Comment(0)
L
0

Make sure that you are referencing the namespace in your XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
             xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
             xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"       >
Liturgy answered 17/6, 2011 at 19:20 Comment(0)
E
0

Startlingly, if I don't set factory.setNamespaceAware(true); then the xpath you mentioned does work with and without namespaces at play. You just aren't able to select things "with namespace specified" only generic xpaths. Go figure. So this may be an option:

 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
 factory.setNamespaceAware(false);
Eft answered 6/2, 2019 at 23:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.