Python version 2.7: XML ElementTree: How to iterate through certain elements of a child element in order to find a match
Asked Answered
H

4

11

I'm a programming novice and only rarely use python so please bear with me as I try to explain what I am trying to do :)

I have the following XML:

<?xml version = "1.0" encoding = "utf-8"?>
<Patients>
    <Patient>
               <PatientCharacteristics>
                   <patientCode>3</patientCode>
               </PatientCharacteristics>
               <Visits>
                   <Visit>
                          <DAS>
                               <CRP>14</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>20</SWOL28>
                                       <TEN28>20</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-02-17</VisitDate>
                   </Visit>
                   <Visit>
                          <DAS>
                               <CRP>10</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>15</SWOL28>
                                       <TEN28>20</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-02-10</VisitDate>
                   </Visit>
               </Visits>
    </Patient>
    <Patient>
        <PatientCharacteristics>
                   <patientCode>3</patientCode>
        </PatientCharacteristics>
               <Visits>
                   <Visit>
                          <DAS>
                               <CRP>14</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>34</SWOL28>
                                       <TEN28>0</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-08-17</VisitDate>
                   </Visit>
                   <Visit>
                          <DAS>
                               <CRP>10</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28></SWOL28>
                                       <TEN28>2</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2010-07-10</VisitDate>
                   </Visit>
                   <Visit>
                          <DAS>
                               <CRP>9</CRP>
                               <ESR/>
                               <Joints>
                                       <DAS_PROFILE>28/28</DAS_PROFILE>
                                       <SWOL28>56</SWOL28>
                                       <TEN28>6</TEN28>
                               </Joints>
                          </DAS>
                          <VisitDate>2009-07-10</VisitDate>
                   </Visit>
               </Visits>

    </Patient>
</Patients>

All I want to do here is update certain 'SWOL28' values if they match the patientCode and VisitDate that I have stored in a text file. As I understand, elementtree does not include a parent reference, as if it did, I could just use findall() from the root and work backwards from there. As it stands here is my psuedocode:

  1. For each line in the text file:
  2. Put Visit_Date Patient_Code New_SWOL28 into variables
  3. For each patient element:
  4. If patientCode = Patient_Code
  5. For each Visit element:
  6. If VisitDate = Visit_Date
  7. If SWOL28 element exists for this visit
  8. Update SWOL28 to New_SWOL28

But I am stuck at step number 5. How do I get a list of visits to iterated through? Apologies if this is a very dumb question but I have searched high and low for an answer I assure you! I have stripped down my code to the bare example of the part I need to fix below:

import xml.etree.ElementTree as ET
tree = ET.parse('DB3.xml')
root = tree.getroot()
for child in root: # THIS GETS ME ALL THE PATIENT ATTRIBUTES
    print child.tag 
    for x in child/Visit: # THIS IS WHAT I CANNOT FIND THE CORRECT SYNTAX FOR
        # I WOULD THEN PERFORM STEPS 6, 7 AND 8 HERE

I would be deeply appreciative of any ideas any of you may have on this. I am not a programming natural that's for sure!

Thanks in advance, Sarah

Edit 1:

On the advice of SVK below I tried the following:

import xml.etree.ElementTree as ET
tree = ET.parse('Untitled.xml')
root = tree.getroot()
for child in root:
    print child.tag 
    child.find( "visits" )
    for x in child.iter("visit"):
        print x.tag, x.text

But the only output I get is: Patient Patient and none of the lower tags. Any ideas?

Highchair answered 26/3, 2013 at 17:1 Comment(3)
You don't appear to have a top level tag, e.g. <Patients> did you edit that out, or is this your document as is?Outbid
Sorry yes, just added it in there now. Thanks!Highchair
I'd use lxml here (API-compatible library) and make use of xpath expressions. With the right XPath expression selecting the correct visits is easy enough.Herv
S
6

This is untested by it should be fairly close to what you want.

for patient in root:
    patient_code =  patient.find('PatientCharacteristics').find('patientCode')
    if patient_code.text == code:
            for visit in patient.find('Visits'):
                    visit_date = visit.find('VisitDate')
                    if visit_date.text == date:
                        swol28 = visit.find('DAS').find('Joints').find('SWOL28')
                        if swol28.text:
                            visit.find('DAS').find('Joints').set('SWOL28', new_swol28)
Saleem answered 26/3, 2013 at 17:28 Comment(1)
That works! Many many many thanks, I can't believe it was that easy, kicking myself!!! BTW you are a genius :)Highchair
T
7

You can iterate over all the "visit" tags directly under an element "element" like this:

for x in element.iter("visit"):

You can find the first direct child of element matching a certain tag with:

element.find( "visits" )

It looks like you will first have to locate the "visits" element, which is the parent of "visit", and then iterate through its "visit" children. Putting those together you'd have something like this:

for patient_element in root:
    print patient_element.tag 
    visits_element = patient_element.find( "visits" )
    for visit_element in visits_element.iter("visit"):
        print visit_element.tag, visit_element.text
        # ... further processing of each visit element here

In general look at the section "Finding interesting elements" in the documentation for xml.etree.ElementTree: http://docs.python.org/2/library/xml.etree.elementtree.html#finding-interesting-elements

Tabshey answered 26/3, 2013 at 17:4 Comment(3)
Many thanks, I tried your answer but to no avail, see edit 1.Highchair
Sorry, the answer was a bit messy. I believe the thing you missed was that child.find( "blah" ) will not do anything by itself -- it will return the node it found. You need to use the return value to continue your search.Tabshey
Does elem.iter('visit') iterate through all the elements within 'elem' (first-level, or inner levels) tag that have 'visit' tag? Or is it only for first-level children?Felicitous
S
6

This is untested by it should be fairly close to what you want.

for patient in root:
    patient_code =  patient.find('PatientCharacteristics').find('patientCode')
    if patient_code.text == code:
            for visit in patient.find('Visits'):
                    visit_date = visit.find('VisitDate')
                    if visit_date.text == date:
                        swol28 = visit.find('DAS').find('Joints').find('SWOL28')
                        if swol28.text:
                            visit.find('DAS').find('Joints').set('SWOL28', new_swol28)
Saleem answered 26/3, 2013 at 17:28 Comment(1)
That works! Many many many thanks, I can't believe it was that easy, kicking myself!!! BTW you are a genius :)Highchair
C
0

You could use a CssSelector to get the nodes you want from the Patient element:

from lxml.cssselect import CSSSelector
visitSelector = CSSSelector('Visit')
visits =  visitSelector(child)

you can do the same to get the patientCode Tag and the SWOL28 tag then you can access and modifiy the text of the elements using element.text

Cyrenaica answered 26/3, 2013 at 17:9 Comment(1)
my version of python doesn't have lxml, and I looked into installing it and it was a little out of my depth! Thanks though!Highchair
O
0

If you use lxml.etree, you can use xpath to find the elements you need to update.

E.g.

doc.xpath('Patient[PatientCharacteristics/patientCode=$patient]/Visits/Visit[VisitDate=$visit]',patient="3",visit="2009-07-10")

So

from lxml import etree

doc = etree.parse("DB3.xml")

changes = [
  dict(patient='3',visit='2010-08-17',swol28="99"),
]

def update_doc(x,d):
  for row in d:
    for visit in x.xpath('Patient[PatientCharacteristics/patientCode=$patient]/Visits/Visit[VisitDate=$visit]',**row):
      for swol28 in visit.xpath('DAS/Joints/SWOL28'):
        swol28.text = row['swol28']

update_doc(doc,changes)

print etree.tostring(doc)

Should yield you something that contains:

<Patient>
  <PatientCharacteristics>
    <patientCode>3</patientCode>
  </PatientCharacteristics>
  <Visits>
    <Visit>
      <DAS>
      <CRP>14</CRP>
      <ESR/>
      <Joints>
        <DAS_PROFILE>28/28</DAS_PROFILE>
        <SWOL28>99</SWOL28>
        <TEN28>0</TEN28>
      </Joints>
    </DAS>
    <VisitDate>2010-08-17</VisitDate>
    </Visit>
  </Visits>
</Patient>
Outbid answered 26/3, 2013 at 17:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.