Suppress namespace in ElementTree
Asked Answered
D

1

8

Given an xml file that looks like this:

<?xml version="1.0" encoding="windows-1252"?>
<Message xmlns="http://example.com/ns" xmlns:myns="urn:us:gov:dot:faa:aim:saa">
  <foo id="stuffid"/>
  <myns:bar/>
</Message>

When I parse it with ElementTree, the element tags look like:

{http://example.com/ns}Message
  {http://example.com/ns}foo
  {urn:us:gov:dot:faa:aim:saa}bar

But I'd rather just have

Message
  foo
  bar

and more importantly, I'd rather just pass "Message", "foo", and "bar" into the find() and findall() methods.

I've tried using substitutions to censor all xmlns: attributes as suggested in https://mcmap.net/q/136201/-python-elementtree-module-how-to-ignore-the-namespace-of-xml-files-to-locate-matching-element-when-using-the-method-quot-find-quot-quot-findall-quot (and this is probably what I'll have to do if I can't find something more elegant), and I've tried calling ElementTree.register_namespace('', "http://example.com/ns") but that seems to only help with ElementTree.tostring(), which isn't what I wanted.

Isn't there just some way to get ElementTree to pretend it never heard of xmlns?

Let's assume that my element tags are globally unique even without the namespace qualifiers. In this case, the namespaces just get in the way.


Addressing some of the comments in detail:

Joe linked to Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall" which is close enough to my question that I guess mine is a duplicate. However, that question was not answered either. The suggestions given there were:

  • Use tree.findall("xmlns:DEAL_LEVEL/xmlns:PAID_OFF", namespaces={'xmlns': 'http://www.test.com'}).
  • Pre-process the input XML and strip the xmlns attributes from the input as mentioned above.
  • Post-process the parsed document and strip all the namespaces from the tags.
    • Frankly, I like this approach the best. I will post the code as an answer.
  • Use register_namespace("", "http://example.com/ns")
    • This suppresses the namespace when using ElementTree.tostring(el) but not in el.tag. I expect it doesn't help find() or findall() either.
    • Again, this doesn't solve the problem where I need to know all the namespaces in advance (or extract them from the document somehow).
Declarer answered 13/9, 2015 at 5:27 Comment(3)
possible duplicate of Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall"Upholstery
Yes its a dup. Just set the value of the xmlns attribute to the empty string as shown at the URL in the comment above.Seow
I've edited my post and addressed these comments. Thanks for the links.Declarer
D
7

OK, thanks for the links to the other question. I've decided to borrow (and improve on) one of the solutions given there:

def stripNs(el):
  '''Recursively search this element tree, removing namespaces.'''
  if el.tag.startswith("{"):
    el.tag = el.tag.split('}', 1)[1]  # strip namespace
  for k in el.attrib.keys():
    if k.startswith("{"):
      k2 = k.split('}', 1)[1]
      el.attrib[k2] = el.attrib[k]
      del el.attrib[k]
  for child in el:
    stripNs(child)
Declarer answered 13/9, 2015 at 17:56 Comment(1)
for k in el.attrib.keys(): should be keys = list(el.attrib.keys()); for k in keys: or something similar because you are deleting one of the keys and could get a "dictionary keys changed during iteration" runtime error.Denesedengue

© 2022 - 2024 — McMap. All rights reserved.