Python's built-in xml.etree
package supports parsing XML files with namespaces, but namespace prefixes get expanded to the full URI enclosed in brackets. So in the example file in the official documentation:
<actors xmlns:fictional="http://characters.example.com"
xmlns="http://people.example.com">
<actor>
<name>John Cleese</name>
<fictional:character>Lancelot</fictional:character>
<fictional:character>Archie Leach</fictional:character>
</actor>
...
The actor
tag gets expanded to {http://people.example.com}actor
and fictional:character
to {http://characters.example.com}character
.
I can see how this makes everything very explicit and reduces ambiguity (the file could have the same namespace with a different prefix, etc.) but it is very cumbersome to work with. The Element.find()
method and others allow passing a dict
mapping prefixes to namespace URIs so I can still do element.find('fictional:character', nsmap)
but to my knowledge there is nothing similar for tag attributes. This leads to annoying stuff like element.attrib['{{{}}}attrname'.format(nsmap['prefix'])]
.
The popular lxml
package provides the same API with a few extensions, one of which is an nsmap
property on the elements that they inherit from the document. However none of the methods seem to actually make use of it, so I still have to do element.find('fictional:character', element.nsmap)
which is just unnecessarily repetitive to type out every time. It also still doesn't work with attributes.
Luckily lxml
supports subclassing BaseElement
, so I just made one with a p
(for prefix) property that has the same API but automatically uses namespace prefixes using the element's nsmap
(Edit: likely best to assign a custom nsmap
defined in code). So I just do element.p.find('fictional:character')
or element.p.attrib['prefix:attrname']
, which much less repetitive and I think way more readable.
I just feel like I'm really missing something though, it really feels like this should really already be a feature of lxml
if not the builtin etree
package. Am I somehow doing this wrong?
nsmap
I would still need to pass it each time because applying it to anlxml.Etree
element object still doesn't actually do anything. Passing an nsmap tofind()
each time isn't too bad, butelement.attrib['{{{}}}attrname'.format(nsmap['prefix'])]
all over the place is pretty awful regardless of whether I have defined a static nsmap in code or not. – Foudroyant