xml.etree.ElementTree.ParseError -- exception handling not catching errors
Asked Answered
S

2

5

I'm trying to parse an xml document that has a number of undefined entities that cause a ParseError when I try to run my code, which is as follows:

import xml.etree.ElementTree as ET

tree = ET.parse('cic.fam_lat.xml')
root = tree.getroot()

while True:
    try:
        for name in root.iter('name'):
            print(root.tag, name.text)
    except xml.etree.ElementTree.ParseError:
        pass

for name in root.iter('name'):
    print(name.text)

An example of said error is as follows, and there are a number of undefined entities that will all throw the same error: error description

I just want to ignore them rather than go in and edit out each one. How should I edit my exception handling to catch these error instances? (i.e., what am I doing wrong?)

Sandi answered 21/12, 2017 at 4:5 Comment(0)
R
5

There are some workarounds, like defining custom entities, suggested at:

But, if you are able to switch to lxml, its XMLParser() can work in the "recover" mode that would "ignore" the undefined entities:

import lxml.etree as ET

parser = ET.XMLParser(recover=True)
tree = ET.parse('cic.fam_lat.xml', parser=parser)

for name in root.iter('name'):
    print(root.tag, name.text)

(worked for me - got the tag names and texts printed)

Romano answered 21/12, 2017 at 4:13 Comment(3)
Excellent, thank you! Yeah, lxml seems to be the way to go -- recover mode worked perfectly. Now just to figure out how to get to a certain parent tag from each instance of <name>...Sandi
This doesn't really answer the question. Why is the ParseError not caught?Mcburney
@LondonRob: The exception is thrown already at tree = ET.parse('cic.fam_lat.xml'). The document is ill-formed because of the undefined entity and ElementTree refuses to parse it.Triceratops
S
2

You can catch the exception simply by referencing the ParseError like this:

try:
    # Something neat

except ET.ParseError:
    # Exception catch

This is on Python 3.7.10, Windows 10.

Schnauzer answered 27/10, 2021 at 4:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.