Given an xml file that looks like this:
<?xml version="1.0" encoding="windows-1252"?>
<Message xmlns="http://example.com/ns" xmlns:myns="urn:us:gov:dot:faa:aim:saa">
<foo id="stuffid"/>
<myns:bar/>
</Message>
When I parse it with ElementTree, the element tags look like:
{http://example.com/ns}Message
{http://example.com/ns}foo
{urn:us:gov:dot:faa:aim:saa}bar
But I'd rather just have
Message
foo
bar
and more importantly, I'd rather just pass "Message", "foo", and "bar" into the find()
and findall()
methods.
I've tried using substitutions to censor all xmlns:
attributes as suggested in https://mcmap.net/q/136201/-python-elementtree-module-how-to-ignore-the-namespace-of-xml-files-to-locate-matching-element-when-using-the-method-quot-find-quot-quot-findall-quot (and this is probably what I'll have to do if I can't find something more elegant), and I've tried calling ElementTree.register_namespace('', "http://example.com/ns")
but that seems to only help with ElementTree.tostring()
, which isn't what I wanted.
Isn't there just some way to get ElementTree to pretend it never heard of xmlns
?
Let's assume that my element tags are globally unique even without the namespace qualifiers. In this case, the namespaces just get in the way.
Addressing some of the comments in detail:
Joe linked to Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall" which is close enough to my question that I guess mine is a duplicate. However, that question was not answered either. The suggestions given there were:
- Use
tree.findall("xmlns:DEAL_LEVEL/xmlns:PAID_OFF", namespaces={'xmlns': 'http://www.test.com'})
.- I couldn't find the documentation for that call with those arguments in https://docs.python.org/2/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.findall, and at any rate it requires that I know all of the namespaces.
- Pre-process the input XML and strip the xmlns attributes from the input as mentioned above.
- Post-process the parsed document and strip all the namespaces from the tags.
- Frankly, I like this approach the best. I will post the code as an answer.
- Use
register_namespace("", "http://example.com/ns")
- This suppresses the namespace when using
ElementTree.tostring(el)
but not inel.tag
. I expect it doesn't helpfind()
orfindall()
either. - Again, this doesn't solve the problem where I need to know all the namespaces in advance (or extract them from the document somehow).
- This suppresses the namespace when using