ElementTree in Python 2.6.2 Processing Instructions support?

Asked 29/9, 2009 at 0:9 Answered 10/12, 2015 at 5:8

I'm trying to create XML using the ElementTree object structure in python. It all works very well except when it comes to processing instructions. I can create a PI easily using the factory function ProcessingInstruction(), but it doesn't get added into the elementtree. I can add it manually, but I can't figure out how to add it above the root element where PI's are normally placed. Anyone know how to do this? I know of plenty of alternative methods of doing it, but it seems that this must be built in somewhere that I just can't find.

Ansel answered 29/9, 2009 at 0:9 Comment(0)

Try the lxml library: it follows the ElementTree api, plus adds a lot of extras. From the compatibility overview:

ElementTree ignores comments and processing instructions when parsing XML, while etree will read them in and treat them as Comment or ProcessingInstruction elements respectively. This is especially visible where comments are found inside text content, which is then split by the Comment element.

You can disable this behaviour by passing the boolean remove_comments and/or remove_pis keyword arguments to the parser you use. For convenience and to support portable code, you can also use the etree.ETCompatXMLParser instead of the default etree.XMLParser. It tries to provide a default setup that is as close to the ElementTree parser as possible.

Not in the stdlib, I know, but in my experience the best bet when you need stuff that the standard ElementTree doesn't provide.

Sharma answered 29/9, 2009 at 21:15 Comment(0)

With the lxml API it couldn't be easier, though it is a bit "underdocumented":

If you need a top-level processing instruction, create it like this:

from lxml import etree

root = etree.Element("anytagname")
root.addprevious(etree.ProcessingInstruction("anypi", "anypicontent"))

The resulting document will look like this:

<?anypi anypicontent?>
<anytagname />

They certainly should add this to their FAQ because IMO it is another feature that sets this fine API apart.

Boogiewoogie answered 20/11, 2011 at 7:38 Comment(1)

This doesn't on the root element. – Medication 19/5, 2014 at 12:47

Yeah, I don't believe it's possible, sorry. ElementTree provides a simpler interface to (non-namespaced) element-centric XML processing than DOM, but the price for that is that it doesn't support the whole XML infoset.

There is no apparent way to represent the content that lives outside the root element (comments, PIs, the doctype and the XML declaration), and these are also discarded at parse time. (Aside: this appears to include any default attributes specified in the DTD internal subset, which makes ElementTree strictly-speaking a non-compliant XML processor.)

You can probably work around it by subclassing or monkey-patching the Python native ElementTree implementation's write() method to call _write on your extra PIs before _writeing the _root, but it could be a bit fragile.

If you need support for the full XML infoset, probably best stick with DOM.

Pryce answered 29/9, 2009 at 0:52 Comment(0)

I don't know much about ElementTree. But it is possible that you might be able to solve your problem using a library I wrote called "xe".

xe is a set of Python classes designed to make it easy to create structured XML. I haven't worked on it in a long time, for various reasons, but I'd be willing to help you if you have questions about it, or need bugs fixed.

It has the bare bones of support for things like processing instructions, and with a little bit of work I think it could do what you need. (When I started adding processing instructions, I didn't really understand them, and I didn't have any need for them, so the code is sort of half-baked.)

Take a look and see if it seems useful.

http://home.avvanta.com/~steveha/xe.html

Here's an example of using it:

import xe
doc = xe.XMLDoc()

prefs = xe.NestElement("prefs")
prefs.user_name = xe.TextElement("user_name")
prefs.paper = xe.NestElement("paper")
prefs.paper.width = xe.IntElement("width")
prefs.paper.height = xe.IntElement("height")

doc.root_element = prefs


prefs.user_name = "John Doe"
prefs.paper.width = 8
prefs.paper.height = 10

c = xe.Comment("this is a comment")
doc.top.append(c)

If you ran the above code and then ran print doc here is what you would get:

<?xml version="1.0" encoding="utf-8"?>
<!-- this is a comment -->
<prefs>
    <user_name>John Doe</user_name>
    <paper>
        <width>8</width>
        <height>10</height>
    </paper>
</prefs>

If you are interested in this but need some help, just let me know.

Good luck with your project.

Jea answered 29/9, 2009 at 4:35 Comment(0)

f = open('D:\Python\XML\test.xml', 'r+')
old = f.read()
f.seek(44,0)      #place cursor after xml declaration
f.write('<?xml-stylesheet type="text/xsl" href="C:\Stylesheets\expand.xsl"?>'+ old[44:])

I was facing the same problem and came up with this crude solution after failing to insert the PI into the .xml file correctly even after using one of the Element methods in my case root.insert (0, PI) and trying multiple ways to cut and paste the inserted PI to the correct location only to find the data to be deleted from unexpected locations.

Aalst answered 10/12, 2015 at 5:8 Comment(0)

Recommended topics

Hot tags