XML walking in python [closed]
Asked Answered
J

2

7

I am new to python and would like to understand parsing xml. I have not been able to find any great examples or explanations of how to create a generic program to walk an XML nodeset.

I want to be able to categorize and identify all elements and attributes by name and value, without having any information about the xml schema. I don't want to rely on calling elements and attributes specifically by tag name or text.

Could someone please point me in the right direction?

Thanks

UPDATE:

The specific question that was being asked was, "how do I generally recurse all nodes from the root node in an XML document without having any intimate knowledge about the schema."

At the time, being new to python and understanding how to perform that operation in many other languages, I was perplexed by any real world examples that didn't rely on named nodes to traverse the DOM, which isn't what I wanted at all.

Hope this clarifies the question, as the information in this thread is indeed useful.

Jonathonjonati answered 20/11, 2012 at 2:36 Comment(3)
have you tried anything? take a look at lxml.Arbor
See Also: diveintopython.net/xml_processing/index.html#kgp.diveinLoch
Hi, I am not sure why this was closed as "not a real question"? I asked a very specific question, and was fairly precise on the concept that was attempting to understand. What's the issue with my question so that I don't make the same mistake again?Jonathonjonati
C
6

Check out the documentation of ElementTree on the python help

A basic stub of code from that page is:

    import xml.etree.ElementTree as ET
    tree = ET.parse(filename)
    root = tree.getroot()
    for child in root:  
      child.tag, child.attrib

you can keep running for child in root: recursively downward until there aren't any more children.

Crankle answered 20/11, 2012 at 3:4 Comment(0)
S
9

use cElementTree; its 15-20 times faster than the Python version of ElementTree, and uses 2-5 times less memory. http://effbot.org/zone/celementtree.htm

import xml.etree.cElementTree as ET
tree = ET.parse('test.xml')
for elem in tree.getiterator():
    if elem.tag:
        print 'my name:'
        print '\t'+elem.tag
    if elem.text:
        print 'my text:'
        print '\t'+(elem.text).strip()
    if elem.attrib.items():
        print 'my attributes:'
        for key, value in elem.attrib.items():
            print '\t'+'\t'+key +' : '+value
    if list(elem): # use elem.getchildren() for python2.6 or before
        print 'my no of child: %d'%len(list(elem))
    else:
        print 'No child'
    if elem.tail:
        print 'my tail:'
        print '\t'+'%s'%elem.tail.strip()
    print '$$$$$$$$$$'
Sessler answered 20/11, 2012 at 7:14 Comment(1)
For Python 3, use: import xml.etree.ElementTree as ET and for elem in tree.iter(): and parentheses for print() calls.Isidraisidro
C
6

Check out the documentation of ElementTree on the python help

A basic stub of code from that page is:

    import xml.etree.ElementTree as ET
    tree = ET.parse(filename)
    root = tree.getroot()
    for child in root:  
      child.tag, child.attrib

you can keep running for child in root: recursively downward until there aren't any more children.

Crankle answered 20/11, 2012 at 3:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.