Access nested children in xml file parsed with ElementTree

Asked 11/5, 2017 at 16:40 Answered 11/5, 2017 at 17:5

Solved python xml tree xml-parsing elementtree

I am new to xml parsing. This xml file has the following tree:

FHRSEstablishment
 |--> Header
 |    |--> ...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...

but when I access it with ElementTree and look for the child tags and attributes,

import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
   file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
   print child.tag, child.attrib

I only get:

Header {}
EstablishmentCollection {}

which I assume means that their attributes are empty. Why is it so, and how can I access the children nested inside EstablishmentDetail and Scores?

EDIT

Thanks to the answers below I can get inside the tree, but if I want to retrieve values such as those in Scores, this fails:

for node in root.find('.//EstablishmentDetail/Scores'):
    rating = node.attrib.get('Hygiene')
    print rating

and produces

None
None
None

Why is that?

Hyposensitize answered 11/5, 2017 at 16:40 Comment(0)

Yo have to iter() over your root.

that is root.iter() would do the trick!

import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
   print child.tag, child.attrib

Output:

FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...

To get all tags inside EstablishmentDetail you need to find that tag and then loop through its children!

That is, for example.

for child in root.find('.//EstablishmentDetail'):
    print child.tag, child.attrib

Output:

FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}

To get the score for Hygiene as you've mentioned in comment,

What you have done is, it will get the first Scores tag and that will have Hygiene, ConfidenceInManagement, Structural tags as child when you call for each in root.find('.//Scores'):rating=child.get('Hygiene'). That is, obviously all three child will not have the element!

You need to first - find all Scores tag. - find Hygiene in every tags found!

for each in root.findall('.//Scores'):
    rating = each.find('.//Hygiene')
    print '' if rating is None else rating.text

Output:

Conservation answered 11/5, 2017 at 17:1 Comment(3)

Wow, this was good, but I still struggle to get the ultimate values, such as the scores. If I do for child in root.find('.//Scores'): rating = child.get('Hygiene'); print rating; I get None as a result. – Hyposensitize 11/5, 2017 at 18:26

What does .// do? Is this a regular expression? – Perpetua 8/4, 2019 at 16:40

So many posts about how to do this and none of them worked. This is the first one that told me to use root.iter() the only thing that worked. Nice job! – Cystocele 4/11, 2022 at 17:57

Hope it could be useful:

import xml.etree.ElementTree as etree
with open('filename.xml') as tmpfile:
    doc = etree.iterparse(tmpfile, events=("start", "end"))
    doc = iter(doc)
    event, root = doc.next()
    num = 0
    for event, elem in doc:
        print event, elem

Philips answered 11/5, 2017 at 17:5 Comment(2)

event, root = doc.next() AttributeError: 'IterParseIterator' object has no attribute 'next' – Bellyband 13/10, 2019 at 1:34

My script works on python2, for python3 use: event, root = doc.__next__() – Philips 18/10, 2019 at 9:51

Recommended topics

Hot tags