How can I strip namespaces out of an lxml tree?
Asked Answered
A

1

10

Following on from Removing child elements in XML using python ...

Thanks to @Tichodroma, I have this code:

If you can use lxml, try this:

 import lxml.etree

 tree = lxml.etree.parse("leg.xml")
 for dog in tree.xpath("//Leg1:Dog",
                       namespaces={"Leg1": "http://what.not"}):
     parent = dog.xpath("..")[0]
     parent.remove(dog)
     parent.text = None
 tree.write("leg.out.xml")

Now leg.out.xml looks like this:

 <?xml version="1.0"?>
 <Leg1:MOR xmlns:Leg1="http://what.not" oCount="7">
   <Leg1:Order>
     <Leg1:CTemp id="FO">
       <Leg1:Group bNum="001" cCount="4"/>
       <Leg1:Group bNum="002" cCount="4"/>
     </Leg1:CTemp>
     <Leg1:CTemp id="GO">
       <Leg1:Group bNum="001" cCount="4"/>
       <Leg1:Group bNum="002" cCount="4"/>
     </Leg1:CTemp>
   </Leg1:Order>
 </Leg1:MOR>

How do I modify my code to remove the Leg1: namespace prefix from all of the elements' tag names?

Antony answered 14/5, 2015 at 7:47 Comment(2)
I had a look and couldn't get it work.Antony
Possible duplicate of Remove namespace and prefix from xml in python using lxmlRemsen
O
20

One possible way to remove namespace prefix from each element :

def strip_ns_prefix(tree):
    #iterate through only element nodes (skip comment node, text node, etc) :
    for element in tree.xpath('descendant-or-self::*'):
        #if element has prefix...
        if element.prefix:
            #replace element name with its local name
            element.tag = etree.QName(element).localname
    return tree

Another version which has namespace checking in the xpath instead of using if statement :

def strip_ns_prefix(tree):
    #xpath query for selecting all element nodes in namespace
    query = "descendant-or-self::*[namespace-uri()!='']"
    #for each element returned by the above xpath query...
    for element in tree.xpath(query):
        #replace element name with its local name
        element.tag = etree.QName(element).localname
    return tree
Organic answered 14/5, 2015 at 9:21 Comment(3)
Thanks this works perfectly. It also fits perfectly into my methods.Antony
You may also need to call etree.cleanup_namespaces(tree) - otherwise etree.tostring(tree) will still show namespaces.Averyaveryl
The second method is more robust (to default namespaces)Dollar

© 2022 - 2024 — McMap. All rights reserved.