xmllint failing to properly query with xpath without namespaces
Asked Answered
A

3

64

I'm trying to query an xml file generated by adium. xmlwf says that it's well formed. By using xmllint's debug option i get the following:

$ xmllint --debug doc.xml
DOCUMENT
version=1.0
encoding=UTF-8
URL=doc.xml
standalone=true
  ELEMENT chat
    default namespace href=http://purl.org/net/ulf/ns/0.4-02
    ATTRIBUTE account
      TEXT
        [email protected]
    ATTRIBUTE service
      TEXT compact
        content=MSN
    TEXT compact
      content= 
    ELEMENT event
      ATTRIBUTE type

Everything seems to parse just fine. However, when I try to query even the simplest things, I don't get anything:

$ xmllint --xpath '/chat' doc.xml 
XPath set is empty

What's happening? Running that exact same query using xpath returns the correct results (however with no newline between results). Am I doing something wrong or is xmllint just not working properly?

Here's a shorter, anonymized version of the xml that shows the same behavior:

<?xml version="1.0" encoding="UTF-8" ?>
<chat xmlns="http://purl.org/net/ulf/ns/0.4-02" account="[email protected]" service="MSN">
<event type="windowOpened" sender="[email protected]" time="2011-11-22T00:34:43-03:00"></event>
<message sender="[email protected]" time="2011-11-22T00:34:43-03:00" alias="foo"><div><span style="color: #000000; font-family: Helvetica; font-size: 12pt;">hi</span></div></message>
</chat>
Agonizing answered 25/11, 2011 at 1:57 Comment(1)
can you please share doc.xml filePejsach
H
108

I don't use xmllint, but I think the reason your XPath isn't working is because your doc.xml file is using a default namespace (http://purl.org/net/ulf/ns/0.4-02).

From what I can see, you have 2 options.

A. Use xmllint in shell mode and declare the namespace with a prefix. You can then use that prefix in your XPath.

    xmllint --shell doc.xml
    / > setns x=http://purl.org/net/ulf/ns/0.4-02
    / > xpath /x:chat

B. Use local-name() to match element names.

    xmllint --xpath /*[local-name()='chat']

You may also want to use namespace-uri()='http://purl.org/net/ulf/ns/0.4-02' along with local-name() so you are sure to return exactly what you are intending to return.

Hadrian answered 25/11, 2011 at 7:48 Comment(14)
Note example A. and B. will fail if you're not accessing a root path, in which case you need a double-slash, eg xmllint --xpath "//*[local-name()='chat']". See #27311814Subscription
@Avt'W - This question/answer is specifically about namespaces in xmllint; not any other XPath topics. What / and // match are totally unrelated.Hadrian
Hey, it's was a comment for the reader that would have a slightly different use case, not a critic of your answer which answers the problem accurately. People having problem with namespaces likely are newbies and thus I thought it was worth pointing that out.Subscription
C. cat foo.xml | sed '2 s/xmlns=".*"//g' | xmllint --xpath ...Inalterable
@Avt'W observation was very helpful hint for us newbies. @daniel-haley Thanks for shell hint. Here is what I think full line would look like. xmllint --xpath "//*[local-name()='chat' and namespace-uri()='http://purl.org/net/ulf/ns/0.4-02']"Drought
NB. This can get confusing and lengthy very quickly. This article has a good tutorial on the subject; namespace-uri() must be added to every portion of the path that needs it, for example.Overgrowth
I wonder why they made shell option setrootns to register all namespaces from root node declaration but not in CLI mode :(Temperate
Not that parsing XML with sed is the best idea in the world, but that regex might be too greedy. To remove namespace declarations without taking out more than you meant, use sed 's/xmlns="[^"]*"//g'.Undis
How did you know of the setns option in the shell. The man page has some entries for the shell commands and that is not one of them. Any method for doing something similar without the shell besides the comment with namespace-uri()... everywhere?Ent
@Ent - I don't remember how I knew about setns or where it's documented. A possible alternative to xmllint would be xmlstarlet. You can bind the namespace to a prefix on the command line or use "_" to match any namespace.Hadrian
Hmm. funny you mentioed xmlstarlet. I have been trying that as well. I tried something like this xmlstarlet sel -N i="someuri" -t -m //xyz -v "@moduleName" -n foo.xml where xyz would be something like some_ns:some_tag or some_ns:some_tag. Now that I write this, I bet it should be: //i:xyz.Ent
Nope that did not work. xmlstarlet sel -N i="some_uri" -t -m /i:foo/i:goo -v "@some_attribute" -n foo.xmlEnt
ahh, this works for xmlstarlet: xmlstarlet sel -N i="some_uri" -t -m /i:foo/i:goo -v "name()" -n foo.xml Without the -v part its just matching and not printing the matching portion.Ent
And the thing which worked for me. `xmlstarlet sel -N x="some uri" -t -m "/x:foo/x:goo[@some_attr='some value']" -v '@some_attr' -n foo.xml It appears the " and ' are critical. Without them or in reverse order ie. outer is " and inner is ' will not work.Ent
E
14

I realize this question is very old now, but in case it helps someone...

Had the same problem and it was due to the XML having a namespace (and sometimes it was duplicated in various places in the XML). Found it easiest to just remove the namespace before using xmllint:

sed -e 's/xmlns="[^"]*"//g' file.xml | xmllint --xpath "..." -

In my case the XML was UTF-16 so I had to convert to UTF-8 first (for sed):

iconv -f utf16 -t utf8 file.xml | sed -e 's/encoding="UTF-16"?>/encoding="UTF-8"?>/' | sed -e 's/xmlns="[^"]*"//g' | xmllint --xpath "..." -
Eri answered 10/4, 2019 at 2:50 Comment(2)
This will clobber data in XML files. The point of tools like xmllint is to parse the XML properly.Incoordination
one can assign the http namespace a local name like x directly in the file: sed -e 's/xmlns=/xmlns:x=/'. Then you can use your command with xpath expressions like //itemSack
P
0

If you're allowed to install powershell in your environment (it's also available for Linux), you can do it like this:

Select-Xml -XPath '/ns:chat' -Namespace $Namespace .\doc.xml | foreach { $_.Node }
   xmlns   : http://purl.org/net/ulf/ns/0.4-02
   account : [email protected]
   service : MSN
   event   : event
   message : message

Of course all the same rules for xpath apply here. To access the text content of a node:

Select-Xml -XPath '/ns:chat/ns:message' -Namespace $Namespace .\doc.xml |foreach {$_.Node.InnerXML }
<div xmlns="http://purl.org/net/ulf/ns/0.4-02"><span style="color: #000000; font-family: Helvetica; font-size: 12pt;">hi</span></div>

Or the content of the sender attribute:

Select-Xml -XPath '/ns:chat/ns:message/@sender' -Namespace $Namespace .\doc.xml |foreach {$_.Node }

#text
-----
[email protected]
Phlegmy answered 29/4, 2021 at 16:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.