Creating a simple XML file using python

Asked 31/8, 2010 at 2:38 Answered 18/10, 2018 at 9:26

223

What are my options if I want to create a simple XML file in python? (library wise)

The xml I want looks like:

<root>
 <doc>
     <field1 name="blah">some value1</field1>
     <field2 name="asdfasd">some vlaue2</field2>
 </doc>

</root>

Carpophore answered 31/8, 2010 at 2:38 Comment(0)

409

These days, the most popular (and very simple) option is the ElementTree API, which has been included in the standard library since Python 2.5.

The available options for that are:

ElementTree (Basic, pure-Python implementation of ElementTree. Part of the standard library since 2.5)
cElementTree (Optimized C implementation of ElementTree. Also offered in the standard library since 2.5. Deprecated and folded into the regular ElementTree as an automatic thing as of 3.3.)
LXML (Based on libxml2. Offers a rich superset of the ElementTree API as well XPath, CSS Selectors, and more)

Here's an example of how to generate your example document using the in-stdlib cElementTree:

import xml.etree.cElementTree as ET

root = ET.Element("root")
doc = ET.SubElement(root, "doc")

ET.SubElement(doc, "field1", name="blah").text = "some value1"
ET.SubElement(doc, "field2", name="asdfasd").text = "some vlaue2"

tree = ET.ElementTree(root)
tree.write("filename.xml")

I've tested it and it works, but I'm assuming whitespace isn't significant. If you need "prettyprint" indentation, let me know and I'll look up how to do that. (It may be an LXML-specific option. I don't use the stdlib implementation much)

For further reading, here are some useful links:

API docs for the implementation in the Python standard library
Introductory Tutorial (From the original author's site)
LXML etree tutorial. (With example code for loading the best available option from all major ElementTree implementations)

As a final note, either cElementTree or LXML should be fast enough for all your needs (both are optimized C code), but in the event you're in a situation where you need to squeeze out every last bit of performance, the benchmarks on the LXML site indicate that:

LXML clearly wins for serializing (generating) XML
As a side-effect of implementing proper parent traversal, LXML is a bit slower than cElementTree for parsing.

Blinnie answered 31/8, 2010 at 3:31 Comment(11)

I'm getting import xml.etree.cElementTree as ET, ImportError: No module named etree.cElementTree - standard OSX 10.8 python, but somehow it works when I run it from inside ipython. – Doud 26/12, 2013 at 19:10

@Kasper: I don't have a Mac so I can't try to duplicate the problem. Tell me the Python version and I'll see if I can replicate it on Linux. – Blinnie 3/1, 2014 at 20:31

@ssokolow, I'm on OSX 10.9 now and this has somehow been resolved, I don't remember if it was my own action or if I did something to resolve it. – Doud 4/1, 2014 at 18:34

@Blinnie I want to avoid loading my XML (which prints to a 3MB file) into memory for writing if possible. Is there an XML writer that can write the XML out as you add to it? – Specialty 28/5, 2014 at 2:7

@Specialty You really should have asked a new question and then sent me a link to it so everyone can benefit from it. However, I will point you in the right direction. DOM (Document Object Model) libraries always build an in-memory model so you want a SAX (Simple API for XML) implementation instead. I've never looked into SAX implementations but here's a tutorial for using the in-stdlib one for output rather than input. – Blinnie 29/5, 2014 at 18:40

Hi, I would like to add to header <?xml version=\"1.0\"?>. How do I do this using xml.etree.cElementTree? – Plowshare 20/1, 2016 at 16:22

@YonatanSimson I don't know how to add that exact string, since ElementTree seems to only obey xml_declaration=True if you specify an encoding... but, to get equivalent behaviour, call tree.write() like this: tree.write("filename.xml", xml_declaration=True, encoding='utf-8') You can use any encoding as long as you explicitly specify one. (ascii will force all Unicode characters outside the 7-bit ASCII set to be entity-encoded if you don't trust a web server to be configured properly.) – Blinnie 22/1, 2016 at 3:43

Just a reminder to anyone else who tries to correct vlaue2 to value2: The typo is in the requested XML output in the original question. Until that changes, the typo here actually is correct. – Blinnie 18/10, 2017 at 3:33

According to the documentation, cElementTree was depreciated in Python 3.3 – Rhiana 26/10, 2017 at 20:43

In case you didn't understand the previous comment, just replace cElementTree with ElementTree and things will work fine. – Stamp 4/7, 2019 at 8:58

I think it would be useful if you included the actual output of your code (filename.xml). One issue with this solution is that it doesn't generate the XML headers, and the XML that is generated doesn't have the same formatting that OP requested. – Thulium 7/5, 2023 at 4:23

The lxml library includes a very convenient syntax for XML generation, called the E-factory. Here's how I'd make the example you give:

#!/usr/bin/python
import lxml.etree
import lxml.builder    

E = lxml.builder.ElementMaker()
ROOT = E.root
DOC = E.doc
FIELD1 = E.field1
FIELD2 = E.field2

the_doc = ROOT(
        DOC(
            FIELD1('some value1', name='blah'),
            FIELD2('some value2', name='asdfasd'),
            )   
        )   

print lxml.etree.tostring(the_doc, pretty_print=True)

Output:

<root>
  <doc>
    <field1 name="blah">some value1</field1>
    <field2 name="asdfasd">some value2</field2>
  </doc>
</root>

It also supports adding to an already-made node, e.g. after the above you could say

the_doc.append(FIELD2('another value again', name='hithere'))

Notch answered 4/4, 2011 at 15:36 Comment(2)

If the name of the tag doesn't conform to the Python identifier rules, then you could use getattr, e.g., getattr(E, "some-tag"). – Access 30/3, 2016 at 11:4

for me print lxml.etree.tostring was causing AttributeError: 'lxml.etree._Element' object has no attribute 'etree'. It worked without starting "lxml." like: etree.tostring(the_doc, pretty_print=True) – Orestes 27/6, 2020 at 18:55

Yattag http://www.yattag.org/ or https://github.com/leforestier/yattag provides an interesting API to create such XML document (and also HTML documents).

It's using context manager and with keyword.

from yattag import Doc, indent

doc, tag, text = Doc().tagtext()

with tag('root'):
    with tag('doc'):
        with tag('field1', name='blah'):
            text('some value1')
        with tag('field2', name='asdfasd'):
            text('some value2')

result = indent(
    doc.getvalue(),
    indentation = ' '*4,
    newline = '\r\n'
)

print(result)

so you will get:

<root>
    <doc>
        <field1 name="blah">some value1</field1>
        <field2 name="asdfasd">some value2</field2>
    </doc>
</root>

Romulus answered 2/5, 2015 at 13:27 Comment(1)

This is a very clean way to produce markup. I dare say it's more Pythonic than any built-in way. – Redhot 8/12, 2020 at 21:41

For the simplest choice, I'd go with minidom: http://docs.python.org/library/xml.dom.minidom.html . It is built in to the python standard library and is straightforward to use in simple cases.

Here's a pretty easy to follow tutorial: http://www.boddie.org.uk/python/XML_intro.html

Frink answered 31/8, 2010 at 3:0 Comment(1)

This answer should include an example of minidom in use. – Rhiana 12/2, 2018 at 21:7

For such a simple XML structure, you may not want to involve a full blown XML module. Consider a string template for the simplest structures, or Jinja for something a little more complex. Jinja can handle looping over a list of data to produce the inner xml of your document list. That is a bit trickier with raw python string templates

For a Jinja example, see my answer to a similar question.

Here is an example of generating your xml with string templates.

import string
from xml.sax.saxutils import escape

inner_template = string.Template('    <field${id} name="${name}">${value}</field${id}>')

outer_template = string.Template("""<root>
 <doc>
${document_list}
 </doc>
</root>
 """)

data = [
    (1, 'foo', 'The value for the foo document'),
    (2, 'bar', 'The <value> for the <bar> document'),
]

inner_contents = [inner_template.substitute(id=id, name=name, value=escape(value)) for (id, name, value) in data]
result = outer_template.substitute(document_list='\n'.join(inner_contents))
print result

Output:

<root>
 <doc>
    <field1 name="foo">The value for the foo document</field1>
    <field2 name="bar">The &lt;value&gt; for the &lt;bar&gt; document</field2>
 </doc>
</root>

The downer of the template approach is that you won't get escaping of < and > for free. I danced around that problem by pulling in a util from xml.sax

Ressler answered 2/11, 2017 at 0:12 Comment(0)

I just finished writing an xml generator, using bigh_29's method of Templates ... it's a nice way of controlling what you output without too many Objects getting 'in the way'.

As for the tag and value, I used two arrays, one which gave the tag name and position in the output xml and another which referenced a parameter file having the same list of tags. The parameter file, however, also has the position number in the corresponding input (csv) file where the data will be taken from. This way, if there's any changes to the position of the data coming in from the input file, the program doesn't change; it dynamically works out the data field position from the appropriate tag in the parameter file.

Cranial answered 18/10, 2018 at 9:26 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags