Get list of XML attribute values in Python
Asked Answered
W

7

15

I need to get a list of attribute values from child elements in Python.

It's easiest to explain with an example.

Given some XML like this:

<elements>
    <parent name="CategoryA">
        <child value="a1"/>
        <child value="a2"/>
        <child value="a3"/>
    </parent>
    <parent name="CategoryB">
        <child value="b1"/>
        <child value="b2"/>
        <child value="b3"/>
    </parent>
</elements>

I want to be able to do something like:

>>> getValues("CategoryA")
['a1', 'a2', 'a3']
>>> getValues("CategoryB")
['b1', 'b2', 'b3']

It looks like a job for XPath but I'm open to all recommendations. I'd also like to hear about your favourite Python XML libraries.

Wobbling answered 17/9, 2008 at 20:32 Comment(0)
D
7

I'm not really an old hand at Python, but here's an XPath solution using libxml2.

import libxml2

DOC = """<elements>
    <parent name="CategoryA">
        <child value="a1"/>
        <child value="a2"/>
        <child value="a3"/>
    </parent>
    <parent name="CategoryB">
        <child value="b1"/>
        <child value="b2"/>
        <child value="b3"/>
    </parent>
</elements>"""

doc = libxml2.parseDoc(DOC)

def getValues(cat):
    return [attr.content for attr in doc.xpathEval("/elements/parent[@name='%s']/child/@value" % (cat))]

print getValues("CategoryA")

With result...

['a1', 'a2', 'a3']
Diabolism answered 17/9, 2008 at 20:32 Comment(3)
Accepted because this is what I ended up using. It's a simple one-liner and I didn't need to install any extra modules. Check out the other answers too - there's some good stuff there.Wobbling
python test.py Traceback (most recent call last): File "test.py", line 1, in <module> import libxml2 ImportError: No module named libxml2Shellashellac
@SR query: You'll probably need libxml2 to use this libxml2 example.Diabolism
D
7

ElementTree 1.3 (unfortunately not 1.2 which is the one included with Python) supports XPath like this:

import elementtree.ElementTree as xml

def getValues(tree, category):
    parent = tree.find(".//parent[@name='%s']" % category)
    return [child.get('value') for child in parent]

Then you can do

>>> tree = xml.parse('data.xml')
>>> getValues(tree, 'CategoryA')
['a1', 'a2', 'a3']
>>> getValues(tree, 'CategoryB')
['b1', 'b2', 'b3']

lxml.etree (which also provides the ElementTree interface) will also work in the same way.

Delp answered 17/9, 2008 at 21:5 Comment(0)
P
4

You can do this with BeautifulSoup

>>> from BeautifulSoup import BeautifulStoneSoup
>>> soup = BeautifulStoneSoup(xml)
>>> def getValues(name):
. . .      return [child['value'] for child in soup.find('parent', attrs={'name': name}).findAll('child')]

If you're doing work with HTML/XML I would recommend you take a look at BeautifulSoup. It's similar to the DOM tree but contains more functionality.

Palliasse answered 17/9, 2008 at 20:50 Comment(0)
P
4

Using a standard W3 DOM such as the stdlib's minidom, or pxdom:

def getValues(category):
    for parent in document.getElementsByTagName('parent'):
        if parent.getAttribute('name')==category:
            return [
                el.getAttribute('value')
                for el in parent.getElementsByTagName('child')
            ]
    raise ValueError('parent not found')
Prophets answered 17/9, 2008 at 21:4 Comment(0)
A
4

My preferred python xml library is lxml , which wraps libxml2.
Xpath does seem the way to go here, so I'd write this as something like:

from lxml import etree

def getValues(xml, category):
    return [x.attrib['value'] for x in 
            xml.findall('/parent[@name="%s"]/*' % category)]

xml = etree.parse(open('filename.xml'))

>>> print getValues(xml, 'CategoryA')
['a1', 'a2', 'a3']
>>> print getValues(xml, 'CategoryB')
['b1', 'b2', 'b3]
Astatic answered 17/9, 2008 at 21:12 Comment(0)
R
4

In Python 3.x, fetching a list of attributes is a simple task of using the member items()

Using the ElementTree, below snippet shows a way to get the list of attributes. NOTE that this example doesn't consider namespaces, which if present, will need to be accounted for.

    import xml.etree.ElementTree as ET

    flName = 'test.xml'
    tree = ET.parse(flName)
    root = tree.getroot()
    for element in root.findall('<child-node-of-root>'):
        attrList = element.items()
        print(len(attrList), " : [", attrList, "]" )

REFERENCE:

Element.items()
Returns the element attributes as a sequence of (name, value) pairs.
The attributes are returned in an arbitrary order.

Python manual

Ritter answered 29/12, 2016 at 10:56 Comment(0)
D
2

I must admit I'm a fan of xmltramp due to its ease of use.

Accessing the above becomes:

  import xmltramp

  values = xmltramp.parse('''...''')

  def getValues( values, category ):
    cat = [ parent for parent in values['parent':] if parent(name) == category ]
    cat_values = [ child(value) for child in parent['child':] for parent in cat ]
    return cat_values

  getValues( values, "CategoryA" )
  getValues( values, "CategoryB" )
Dunne answered 17/9, 2008 at 20:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.