Working with namespace while parsing XML using ElementTree
Asked Answered
P

1

3

This is follow on question for Modify a XML using ElementTree

I am now having namespaces in my XML and tried understanding the answer at Parsing XML with namespace in Python via 'ElementTree' and have the following.

XML file.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
 <grandParent>
  <parent>
   <child>Sam/Astronaut</child>
  </parent>
 </grandParent>
</project>

My python code after looking at Parsing XML with namespace in Python via 'ElementTree'

import xml.etree.ElementTree as ET

spaces='xmlns':'http://maven.apache.org/POM/4.0.0','schemaLocation':'http://maven.apache.org/xsd/maven-4.0.0.xsd'}

tree = ET.parse("test.xml")
a=tree.find('parent')          
for b in a.findall('child', namespaces=spaces):
 if b.text.strip()=='Jay/Doctor':
    print "child exists"
    break
else:
    ET.SubElement(a,'child').text="Jay/Doctor"

tree.write("test.xml")

I get the error: AttributeError: 'NoneType' object has no attribute 'findall'

Pinebrook answered 31/7, 2014 at 22:47 Comment(3)
Neither of the code snippets you posted is valid Python. There's stray bits of XML in the first, messed up indentation in both, and missing brackets in the second.Ioyal
Yes my bad. I tried to correct it now.Pinebrook
Aside: the indentation of else is incorrect. It wants to line up with for, not with if.Inadvertence
W
2

There are two problems on this line:

a=tree.find('parent')          

First, <parent> is not an immediate child of the root element. <parent> is a grandchild of the root element. The path to parent looks like /project/grandparent/parent. To search for <parent>, try the XPath expression */parent or possiblly //parent.

Second, <parent> exists in the default namespace, so you won't be able to .find() it with just its simple name. You'll need to add the namespace.

Here are two equally valid calls to tree.find(), each of which should find the <parent> node:

a=tree.find('*/{http://maven.apache.org/POM/4.0.0}parent')
a=tree.find('*/xmlns:parent', namespaces=spaces)

Next, the call to findall() needs a namespace qualifier:

for b in a.findall('xmlns:child', namespaces=spaces) 

Fourth, the call to create the new child element needs a namespace qualifier. There may be a way to use the shortcut name, but I couldn't find it. I had to use the long form of the name.

ET.SubElement(a,'{http://maven.apache.org/POM/4.0.0}child').text="Jay/Doctor"

Finally, your XML output will look ugly unless you provide a default namespace:

tree.write('test.xml', default_namespace=spaces['xmlns'])

Unrelated to the XML aspects, you copied my answer from the previous question incorrectly. The else lines up with the for, not with the if:

for ...
  if ...
else ...
Windowpane answered 31/7, 2014 at 23:16 Comment(7)
I don't mind removing the project tag and add the name space to grandParent tag. <grandParent xmlns="maven.apache.org/POM/4.0.0" xmlns:xsi="w3.org/2001/XMLSchema-instance" xsi:schemaLocation="maven.apache.org/POM/4.0.0 maven.apache.org/xsd/maven-4.0.0.xsd> <parent> <child>Sam/Astronaut</child> </parent> </grandParent>Pinebrook
Works. You saved my day. Working with namespaces was difficult. BTW How do I add a new line character after the sub element has been written/added?Pinebrook
newkid=ET.SubElement(...) ; newkid.text="Jay/Dr" ; newkid.tail="\n"Inadvertence
tree.write('test.xml', default_namespace=spaces['xmlns']) what if I can't provide default_namespace argument? any other way to make sure it isn't ugly?Pinebrook
It won't be terribly ugly. No, I don't know of any other way.Inadvertence
Add this anywhere before tree.write(): ET.register_namespace('', 'http://maven.apache.org/POM/4.0.0')Inadvertence
Let us continue this discussion in chat.Pinebrook

© 2022 - 2024 — McMap. All rights reserved.