Nokogiri/Xpath namespace query
Asked Answered
L

3

42

I'm trying to pull out the dc:title element using an xpath. I can pull out the metadata using the following code.

doc = <<END
<?xml version="1.0" encoding="UTF-8"?>
<package xmlns="http://www.idpf.org/2007/opf" version="2.0">
  <metadata xmlns:dc="URI">
    <dc:title>title text</dc:title>
  </metadata>
</package>
END

doc = Nokogiri::XML(doc)

# Awesome this works!
puts '//xmlns:metadata'
puts doc.xpath('//xmlns:metadata')
# => <metadata xmlns:dc="URI"><dc:title>title text</dc:title></metadata>

As you can see the above appears to work correctly. However I don't seem to be able to get the title information from this node tree, all of the below fail.

puts doc.xpath('//xmlns:metadata/title')
# => nil

puts doc.xpath('//xmlns:metadata/dc:title')
# => ERROR: `evaluate': Undefined namespace prefix

puts doc.xpath('//xmlns:dc:title')
# => ERROR: 'evaluate': Invalid expression: //xmlns:dc:title

Could someone please explain how namespaces should be used in an xpath with the above xml doc.

Looby answered 14/1, 2011 at 11:52 Comment(0)
D
78

All namespaces need to be registered when parsing. Nokogiri automatically registers namespaces on the root node. Any namespaces that are not on the root node you have to register yourself. This should work:

puts doc.xpath('//dc:title', 'dc' => "URI")

Alternately, you can remove namespaces altogether. Only do this if you are certain there will be no conflicting node names.

doc.remove_namespaces!
puts doc.xpath('//title')
Dyspnea answered 14/1, 2011 at 12:27 Comment(1)
Using remove_namespace! is the most sensible thing to try first. But beware: if you're modifying this XML and submitting it to an external API, the API will (often) reject it without the namespaces.Blubber
A
1

With properly registered prefix opf for 'http://www.idpf.org/2007/opf' namespace URI, and dc for 'URI', you need:

/*/opf:metadata/dc:title

Note: xmlns and xml are reserved prefixes that can't be bound to any other namespace URI than the built-in 'http://www.w3.org/2000/xmlns/' and 'http://www.w3.org/XML/1998/namespace'.

Auriculate answered 14/1, 2011 at 12:22 Comment(4)
Didn't seem to work doc.xpath('/*/opf:metadata/dc:title') # => "`evaluate': Undefined namespace prefix"Looby
@Jamie: Did you actually read the answer? First sentence starts "With properly registered prefix"...Auriculate
@Alejandro apologies I don't entirely understand is there a way to do it without the prefix for opf (except the way described in @mark-thomas answer), it'd be nice to do it in one xpath query.Looby
@Jamie: No problem. But it's important that you understand XML namespaces. This is one XPath expression. You need to know the way that your XPath engine use for registering namespace. From @Mark Thomas answer, this seems to be accomplish by a second parameter to xpath() function...Auriculate
P
0

As an alternative to explicitly constructing a hash of namespace URIs, you can retrieve the namespace definitions from the xml element where they're defined.

Using your example:

# First grab the metadata node, because that's where "dc" is defined.
metadata = doc.at_xpath('//xmlns:metadata')

# Pass metadata's namespaces as the resolver.
metadata.at_xpath('dc:title', metadata.namespaces)

Note that the second xpath could've also been:

doc.at_xpath('//dc:title', metadata.namespaces).to_s

But why search from the root when you have a nearer ancestor? Also, you should consider the namespace-defining element plus its children as the "scope" of the namespace. Searching a limited scope is less confusing and avoids subtle bugs.

Pileum answered 28/7, 2016 at 16:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.