Getting a list of XML tags in file, using xml.etree.ElementTree
Asked Answered
M

2

20

As mentioned, I need to get the list of XML tags in file, using library xml.etree.ElementTree.

I am aware that there are properties and methods like ETVar.child, ETVar.getroot(), ETVar.tag, ETVar.attrib.

But to be able to use them and get at least name of tags on level 2, I had to use nested for.

At the moment I have something like

for xmlChild in xmlRootTag:
    if xmlChild.tag:
        print(xmlChild.tag)

Goal would be to get a list of ALL, even deeply nested XML tags in file, eliminating duplicates.

For a better idea, I add the possible example of XML code:

<root>
 <firstLevel>
  <secondlevel level="2">
    <thirdlevel>
      <fourth>text</fourth>
      <fourth2>text</fourth>
    </thirdlevel>
  </secondlevel>
 </firstlevel>
</root>
Milker answered 13/4, 2015 at 1:21 Comment(2)
Fantastic D's solution works fine, but there is a closing ) missing: elemList = list(set(elemList)) The order of the elements is neither the order of appearance nor level nor alphabetic.Souffle
@Souffle Thank you for pointing that out. I updated the answer with your suggestion. Have a nice day!Milker
M
38

I've done more of a research on the subject and found out suitable solution. Since this could be a common task to do, I'll answer it, hence I believe it could help others.

What I was looking for was etree method iter.

import xml.etree.ElementTree as ET
# load and parse the file
xmlTree = ET.parse('myXMLFile.xml')

elemList = []

for elem in xmlTree.iter():
    elemList.append(elem.tag)

# now I remove duplicities - by convertion to set and back to list
elemList = list(set(elemList))

# Just printing out the result
print(elemList)

Important notes

  • xml.etree.ElemTree is a standard Python library
  • sample is written for Python v3.2.3
  • mechanic used to remove duplicities is based on converting to set, which allows only unique values and then converting back to list.
Milker answered 13/4, 2015 at 1:47 Comment(1)
xmlTree.iter() does not seem to work for Python 2.6.9, I had to switch it to xmlTree.getiterator()Coper
G
9

You could use the built-in Python set comprehension:

import xml.etree.ElementTree as ET

xmlTree = ET.parse('myXMLFile.xml')
tags = {elem.tag for elem in xmlTree.iter()}

If you specifically need a list, you can cast it to a list:

import xml.etree.ElementTree as ET

xmlTree = ET.parse('myXMLFile.xml')
tags = list({elem.tag for elem in xmlTree.iter()})
Ginzburg answered 31/5, 2019 at 12:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.