SyntaxError: prefix 'a' not found in prefix map

Asked 23/11, 2016 at 19:0 Answered 24/1, 2020 at 7:54

I'm trying to create a function which counts words in pptx document. The problem is that I can't figure out how to find only this kind of tags:

<a:t>Some Text</a:t>

When I try to: print xmlTree.findall('.//a:t'), it returns

SyntaxError: prefix 'a' not found in prefix map

Do you know what to do to make it work?

This is the function:

def get_pptx_word_count(filename):
    import xml.etree.ElementTree as ET
    import zipfile
    z = zipfile.ZipFile(filename)
    i=0
    wordcount = 0
    while True:
        i+=1
        slidename = 'slide{}.xml'.format(i)
        try:
            slide = z.read("ppt/slides/{}".format(slidename))
        except KeyError:
            break
        xmlTree = ET.fromstring(slide)
        for elem in xmlTree.iter(): 
            if elem.tag=='a:t':
                #text = elem.getText
                #num = len(text.split(' '))
                #wordcount+=num

Aquileia answered 23/11, 2016 at 19:0 Comment(1)

Possible duplicate of Parsing XML with namespace in Python via 'ElementTree' – Elonore 23/11, 2016 at 19:23

-3

You need to tell ElementTree about your XML namespaces.

References:

Official Documentation (Python 2.7): 19.7.1.6. Parsing XML with Namespaces
Existing answer on StackOverflow: Parsing XML with namespace in Python via 'ElementTree'
Article by the author of ElementTree: ElementTree: Working with Namespaces and Qualified Names

Elonore answered 23/11, 2016 at 19:24 Comment(1)

Link only answers aren't particularly helpful. Any relevant information to solving this problem should be included in the answer itself. – Oatmeal 26/10, 2018 at 14:44

The way to specify the namespace inside ElementTree is:

{namespace}element

So, you should change your query to:

print xmlTree.findall('.//{a}t')

Edit:

As @mxjn pointed out if a is a prefix and not the URI you need to insert the URI instead of a:

 print xmlTree.findall('.//{http://tempuri.org/name_space_of_a}t')

or you can supply a prefix map:

 prefix_map = {"a": "http://tempuri.org/name_space_of_a"}
 print xmlTree.findall('.//a:t', prefix_map)

Hellkite answered 24/1, 2020 at 7:54 Comment(1)

This won't work. a is a prefix, not the actual namespace URI. – Plugboard 24/1, 2020 at 8:7

-3

You need to tell ElementTree about your XML namespaces.

References:

Official Documentation (Python 2.7): 19.7.1.6. Parsing XML with Namespaces
Existing answer on StackOverflow: Parsing XML with namespace in Python via 'ElementTree'
Article by the author of ElementTree: ElementTree: Working with Namespaces and Qualified Names

Elonore answered 23/11, 2016 at 19:24 Comment(1)

Link only answers aren't particularly helpful. Any relevant information to solving this problem should be included in the answer itself. – Oatmeal 26/10, 2018 at 14:44

Recommended topics

Hot tags