How to remove ns0 tag while dumping
Asked Answered
C

1

11

I have tried parsing the file using lxml iterparse since the actual file would be huge. I have the following code:

import xml.etree.cElementTree as etree
filename = r'D:\test\Books.xml'
context = iter(etree.iterparse(filename, events=('start', 'end')))
_, root = next(context)
for event, elem in context:
    if event == 'start' and elem.tag == '{http://www.book.org/Book-19200/biblography}Book':
        print(etree.dump(elem))
        root.clear()

And my XML looks like this:

<Books>
    <Book xmlns="http://www.book.org/Book-19200/biblography"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    ISBN="519292296"
    xsi:schemaLocation="http://www.book.org/Book-19200/biblography ../../book.xsd 
    http://www.w3.org/2000/12/xmldsig# ../../xmldsig-core-schema.xsd">
        <Detail ID="67">
            <BookName>Code Complete 2</BookName>
            <Author>Steve McConnell</Author>
            <Pages>960</Pages>
            <ISBN>0735619670</ISBN>        
            <BookName>Application Architecture Guide 2</BookName>
            <Author>Microsoft Team</Author>
            <Pages>496</Pages>
            <ISBN>073562710X</ISBN>
        </Detail>
    </Book>
    <Book xmlns="http://www.book.org/Book-19200/biblography"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    ISBN="519292296"
    xsi:schemaLocation="http://www.book.org/Book-19200/biblography ../../book.xsd 
    http://www.w3.org/2000/12/xmldsig# ../../xmldsig-core-schema.xsd">
        <Detail ID="87">
            <BookName>Rocking Python</BookName>
            <Author>Guido Rossum</Author>
            <Pages>960</Pages>
            <ISBN>0735619690</ISBN>
            <BookName>Python Rocks</BookName>
            <Author>Microsoft Team</Author>
            <Pages>496</Pages>
            <ISBN>073562710X</ISBN>
        </Detail>
    </Book>
</Books>

Running the above generates something like this:

<ns0:Book xmlns:ns0="http://www.book.org/Book-19200/biblography" xmlns:xsi="http://www.w3.org/2001/XMLSchema-ins
ance" ISBN="519292296" xsi:schemaLocation="http://www.book.org/Book-19200/biblography ../../book.xsd      http:/
www.w3.org/2000/12/xmldsig# ../../xmldsig-core-schema.xsd">
        <ns0:Detail ID="67">
            <ns0:BookName>Code Complete 2</ns0:BookName>
            <ns0:Author>Steve McConnell</ns0:Author>
            <ns0:Pages>960</ns0:Pages>
            <ns0:ISBN>0735619670</ns0:ISBN>
            <ns0:BookName>Application Architecture Guide 2</ns0:BookName>
            <ns0:Author>Microsoft Team</ns0:Author>
            <ns0:Pages>496</ns0:Pages>
            <ns0:ISBN>073562710X</ns0:ISBN>
        </ns0:Detail>
    </ns0:Book>

How do I ensure I print the xml fragment without the ns0 prefix? I am using Python 3.

Coh answered 14/3, 2014 at 16:50 Comment(0)
M
16

Add

etree.register_namespace("", "http://www.book.org/Book-19200/biblography")

to your program. This function registers a namespace prefix to be used for serialization (in this case it means no prefix).

Reference: http://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.register_namespace

Miff answered 14/3, 2014 at 17:9 Comment(4)
It is quite annoying that this is implemented as a dictionary - I happen to have multiple namespaces for different sections of my xml (the xsds import each other) and while I could register ns0 to be "", I can't register the rest just replace them as stringConciseness
This raises ValueError: Invalid tag name ''Batch
@Sumit: if you need help, please ask a new question.Miff
@Miff I will. I guess this was specific to the built-in etree library, while I'm using lxml, so it might be that.Batch

© 2022 - 2024 — McMap. All rights reserved.