How should I parse this xml string in python?
Asked Answered
C

4

8

My XML string is -

xmlData = """<SMSResponse xmlns="http://example.com" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
             <Cancelled>false</Cancelled>
             <MessageID>00000000-0000-0000-0000-000000000000</MessageID>  
             <Queued>false</Queued>
             <SMSError>NoError</SMSError>
             <SMSIncomingMessages i:nil="true"/>
             <Sent>false</Sent>
             <SentDateTime>0001-01-01T00:00:00</SentDateTime>
             </SMSResponse>"""

I am trying to parse and get the values of tags - Cancelled, MessageId, SMSError, etc. I am using python's Elementtree library. So far, I have tried things like -

root = ET.fromstring(xmlData)
print root.find('Sent')  // gives None
for child in root:
    print chil.find('MessageId') // also gives None

Although, I am able to print the tags with -

for child in root:
    print child.tag
    //child.tag for the tag Cancelled is - {http://example.com}Cancelled

and their respective values with -

for child in root:
    print child.text

How do I get something like -

print child.Queued // will print false

Like in PHP we can access them with the root -

$xml = simplexml_load_string($data);
$status = $xml->SMSError;
Cantara answered 4/1, 2013 at 9:0 Comment(0)
B
8

Your document has a namespace on it, you need to include the namespace when searching:

root = ET.fromstring(xmlData)
print root.find('{http://example.com}Sent',)
print root.find('{http://example.com}MessageID')

output:

<Element '{http://example.com}Sent' at 0x1043e0690>
<Element '{http://example.com}MessageID' at 0x1043e0350>

The find() and findall() methods also take a namespace map; you can search for a arbitrary prefix, and the prefix will be looked up in that map, to save typing:

nsmap = {'n': 'http://example.com'}
print root.find('n:Sent', namespaces=nsmap)
print root.find('n:MessageID', namespaces=nsmap)
Bashan answered 4/1, 2013 at 9:14 Comment(9)
so basically I am gonna have to specify "{example.com}" every time I want to access the text of a tag?Cantara
@HussainTamboli: There is also a namespaces=mapping argument to find and findall but that appears to be useless when there is a default namespace. lxml handles this all a lot better.Bashan
See @eclaird's answer. I think you were trying to do the same. +1Cantara
It still prints None with nsmap. I think there is something wrong with nsmap.Cantara
@HussainTamboli: I get output for your sample XML. Make sure you spelled the tag name correctly (MessageID and not MessageId).Bashan
I was using xml.etree.ElementTree from the link. I think you are using - lxml.etree like @root is using.Cantara
@HussainTamboli: No, I am using xml.etree, python 2.7. lxml supports the same API (albeit with some improvements, but this is the same in both).Bashan
@HussainTamboli -- Martijn's code is correct, you must have made a mistake somewhere. I wrote down the full code, that gives me a result, try copy-paste'ing it and see if it works.Agbogla
hi, root.find('n:Sent', namespaces=nsmap) is printing the object. append .text to it.Cantara
H
3

If you're set on Python standard XML libraries, you could use something like this:

root = ET.fromstring(xmlData)
namespace = 'http://example.com'

def query(tree, nodename):
    return tree.find('{{{ex}}}{nodename}'.format(ex=namespace, nodename=nodename))

queued = query(root, 'Queued')
print queued.text
Hydrobomb answered 4/1, 2013 at 9:22 Comment(0)
T
2

You can create a dictionary and directly get values out of it...

tree = ET.fromstring(xmlData)

root = {}

for child in tree:
    root[child.tag.split("}")[1]] = child.text

print root["Queued"]
Tortuous answered 4/1, 2013 at 9:5 Comment(5)
hi, see my edit. "//child.tag for the tag Cancelled is - {example.com}Cancelled" so it is difficult to match it with "Cancelled". Is there any better way?Cantara
Hey. it works but this is just an adjustment. How do I access the text of the tags in a way where tag is a key and text is the value.Cantara
Also you might wanna change the return null to return None or return ''. Because with null, it says - NameError: global name 'null' is not definedCantara
This maybe an alternate solution too. +1Cantara
Updated answer with a neater one.Tortuous
A
2

With lxml.etree:

In [8]: import lxml.etree as et

In [9]: doc=et.fromstring(xmlData)

In [10]: ns={'n':'http://example.com'}

In [11]: doc.xpath('n:Queued/text()',namespaces=ns)
Out[11]: ['false']

With elementtree you can do:

import xml.etree.ElementTree as ET    
root=ET.fromstring(xmlData)    
ns={'n':'http://example.com'}
root.find('n:Queued',namespaces=ns).text
Out[13]: 'false'
Agbogla answered 4/1, 2013 at 9:35 Comment(1)
thanks. I was wondering to find something similar in ElementTree. +1Cantara

© 2022 - 2024 — McMap. All rights reserved.