How do I search for a Tag in xml file using ElementTree where i have a certain "Parent"tag with a specific value? (python)
Asked Answered
F

2

6

I just started learning Python and have to write a program, that parses xml files. I have to find a certain Tag called OrganisationReference in 2 different files and return it. In fact there are multiple Tags with this name, but only one, the one I am trying to return, that has the Tag OrganisationType with the value DEALER as a parent Tag (not quite sure whether the term is right). I tried to use ElementTree for this. Here is the code:

    import xml.etree.ElementTree as ET

    tree1 = ET.parse('Master1.xml')
    root1 = tree1.getroot()

    tree2 = ET.parse('Master2.xml')
    root2 = tree2.getroot()

    for OrganisationReference in root1.findall("./Organisation/OrganisationId/[@OrganisationType='DEALER']/OrganisationReference"):
        print(OrganisationReference.attrib)

    for OrganisationReference in root2.findall("./Organisation/OrganisationId/[@OrganisationType='DEALER']/OrganisationReference"):
        print(OrganisationReference.attrib)

But this returns nothing (also no error). Can somebody help me?

My file looks like this:

  <MessageOrganisationCount>a</MessageOrganisationCount>
  <MessageVehicleCount>x</MessageVehicleCount>
  <MessageCreditLineCount>y</MessageCreditLineCount>
  <MessagePlanCount>z</MessagePlanCount>
  <OrganisationData>
      <Organisation>
          <OrganisationId>
              <OrganisationType>DEALER</OrganisationType>
              <OrganisationReference>WHATINEED</OrganisationReference>
          </OrganisationId>
          <OrganisationName>XYZ.</OrganisationName>
 ....

Due to the fact that OrganisationReference appears a few more times in this file with different text between start and endtag, I want to get exactly the one, that you see in line 9: it has OrganisationId as a parent tag, and DEALER is also a child tag of OrganisationId.

Faro answered 25/1, 2019 at 8:14 Comment(4)
How can this be reproduced? Please provide a minimal reproducible example.Turner
Can you provide an example of your XML-File and what your output should look likeSybil
I edited my post, now you can see how my xml file looks like.Faro
Welcome to StackOverflow Jani! Great first question!James
J
6

You were super close with your original attempt. You just need to make a couple of changes to your xpath and a tiny change to your python.

The first part of your xpath starts with ./Organization. Since you're doing the xpath from root, it expects Organization to be a child. It's not; it's a descendant.

Try changing ./Organization to .//Organization. (// is short for /descendant-or-self::node()/. See here for more info.)

The second issue is with OrganisationId/[@OrganisationType='DEALER']. That's invalid xpath. The / should be removed from between OrganisationId and the predicate.

Also, @ is abbreviated syntax for the attribute:: axis and OrganisationType is an element, not an attribute.

Try changing OrganisationId/[@OrganisationType='DEALER'] to OrganisationId[OrganisationType='DEALER'].

The python issue is with print(OrganisationReference.attrib). The OrganisationReference doesn't have any attributes; just text.

Try changing print(OrganisationReference.attrib) to print(OrganisationReference.text).

Here's an example using just one XML file for demo purposes...

XML Input (Master1.xml; with doc element added to make it well-formed)

<doc>
    <MessageOrganisationCount>a</MessageOrganisationCount>
    <MessageVehicleCount>x</MessageVehicleCount>
    <MessageCreditLineCount>y</MessageCreditLineCount>
    <MessagePlanCount>z</MessagePlanCount>
    <OrganisationData>
        <Organisation>
            <OrganisationId>
                <OrganisationType>DEALER</OrganisationType>
                <OrganisationReference>WHATINEED</OrganisationReference>
            </OrganisationId>
            <OrganisationName>XYZ.</OrganisationName>
        </Organisation>
    </OrganisationData>
</doc>

Python

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')
root1 = tree1.getroot()

for OrganisationReference in root1.findall(".//Organisation/OrganisationId[OrganisationType='DEALER']/OrganisationReference"):
    print(OrganisationReference.text)

Printed Output

WHATINEED

Also note that it doesn't appear that you need to use getroot() at all. You can use findall() directly on the tree...

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')

for OrganisationReference in tree1.findall(".//Organisation/OrganisationId[OrganisationType='DEALER']/OrganisationReference"):
    print(OrganisationReference.text)
James answered 25/1, 2019 at 15:41 Comment(1)
I forgot to mention that there is a nametag in the xml file. With considering them in the code it wokrs perfectly fineFaro
S
0

You can use a nested for-loop to do it. First you check whether the text of OrganisationType is DEALER and then get the text of the OrganisationReference that you need.

If you want to learn more about parsing XML with Python I strongly recommend the documentation of the XMLtree library.

import xml.etree.ElementTree as ET

tree1 = ET.parse('Master1.xml')
root1 = tree1.getroot()

tree2 = ET.parse('Master2.xml')
root2 = tree2.getroot()

#Find the parent Dealer
for element in root1.findall('./Organisation/OrganisationId'):
    if element[0].text == "DEALER":
         print(element[1].text)

This works if the first tag in your OrganisationId is OrganisationType :)

Sybil answered 25/1, 2019 at 8:23 Comment(2)
Thanks for your answers.I'm new to Python, thats why I probably mixed the terms up: Dealer is the text of an element. I'm very sorry for this confusion. @SybilFaro
No it still returns nothingFaro

© 2022 - 2024 — McMap. All rights reserved.