Python ElementTree parsing unbound prefix error

Asked 14/11, 2012 at 3:30 Answered 8/10, 2022 at 16:25

I am learning ElementTree in python. Everything seems fine except when I try to parse the xml file with prefix:

test.xml:

<?xml version="1.0"?>
<abc:data>
   <abc:country name="Liechtenstein" rank="1" year="2008">
   </abc:country>
   <abc:country name="Singapore" rank="4" year="2011">
   </abc:country>
   <abc:country name="Panama" rank="5" year="2011">
   </abc:country>
</abc:data>

When I try to parse the xml:

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')

I got the following error:

xml.etree.ElementTree.ParseError: unbound prefix: line 2, column 0

Do I need to specify something in order to parse a xml file with prefix?

Outside answered 14/11, 2012 at 3:30 Comment(1)

in short you have a missing namespace for abc. take a look at: https://mcmap.net/q/25554/-emitting-namespace-specifications-with-elementtree-in-python – Bloodstain 14/11, 2012 at 4:26

Add the abc namespace to your xml file.

<?xml version="1.0"?>
<abc:data xmlns:abc="your namespace">

Purveyor answered 13/2, 2013 at 9:7 Comment(2)

But what about when it's not my XML to change, I just need to parse it? – Mellisamellisent 6/3, 2015 at 1:28

I second the question/comment from @Mark Allen! I am having the same problem. Certainly on a case-by-case basis one could edit the file, but I have many large (and nonuniform) xml files. Surely there is a way to get around this – Toitoiboid 4/8, 2016 at 6:46

I encountered the same issue while processing xml file. You can use below code before parse your XML file. This will resolve your issue.

parser1 = etree.XMLParser(encoding="utf-8", recover=True)
tree1 = ElementTree.parse('filename.xml', parser1)

Teshatesla answered 8/10, 2022 at 16:25 Comment(1)

The recover option works with lxml, but not with the built-in ElementTree library. – Ratter 8/10, 2022 at 16:55

-1

See if this works:

from bs4 import BeautifulSoup

xml_file = "test.xml"

with open(xml_file, "r", encoding="utf8") as f:
    contents = f.read()
    soup = BeautifulSoup(contents, "xml")

    items = soup.find_all("country")
    print (items)

The above will produce an array which you can then manipulate to achieve your aim (e.g. remove html tags etc.):

[<country name="Liechtenstein" rank="1" year="2008">
</country>, <country name="Singapore" rank="4" year="2011">
</country>, <country name="Panama" rank="5" year="2011">
</country>]

Percolator answered 16/8, 2019 at 7:17 Comment(0)

Recommended topics

Hot tags