How to convert an XML string to a dictionary?

Asked 27/1, 2010 at 15:28 Answered 29/3, 2023 at 13:39

Solved python xml json dictionary xml-deserialization

183

I have a program that reads an XML document from a socket. I have the XML document stored in a string which I would like to convert directly to a Python dictionary, the same way it is done in Django's simplejson library.

Take as an example:

str ="<?xml version="1.0" ?><person><name>john</name><age>20</age></person"
dic_xml = convert_to_dic(str)

Then dic_xml would look like {'person' : { 'name' : 'john', 'age' : 20 } }

Mezzosoprano answered 27/1, 2010 at 15:28 Comment(1)

str has a few syntax errors. try:str ='<?xml version="1.0" ?><person><name>john</name><age>20</age></person>' – Gapeworm 1/9, 2017 at 2:34

This is a great module that someone created. I've used it several times. http://code.activestate.com/recipes/410469-xml-as-dictionary/

Here is the code from the website just in case the link goes bad.

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

Example usage:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//Or, if you want to use an XML string:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

Limiter answered 27/4, 2011 at 15:58 Comment(10)

U can use 'xmltodict' alternatively – Selfinsurance 11/5, 2015 at 15:1

I tried this and it's much faster than xmltodict. For parsing an 80MB xml file it took 7s, with xmltodict it took 90s – Desrochers 16/10, 2015 at 21:8

Confirmed... I have not tested this against every edge case but for my rather uncomplicated XML strings, this is pretty fast (about 8 times faster than the xmltodict library). Disadvantage is that you have to host it yourself within your project. – Eightieth 18/4, 2016 at 9:59

it seems that this code can't deal with array as following:<root> <e /> <e>text</e> <e name="value" /> <e name="value">text</e> <e> <a>text</a> <b>text</b> </e> <e> <a>text</a> <a>text</a> </e> <e> text <a>text</a> </e> </root> – Lipstick 25/7, 2016 at 9:56

Hi there, this works perfect, will add just a snippet for those who can't find cElementTree, just change first line to: from xml.etree import cElementTree as ElementTree – Oestriol 13/9, 2016 at 17:14

If you have duplicate sub-tags with different attributes, you lose the attributes. For example, I have multiple <Project name="xyz"> tags, where every name is different; this method drops the name attribute, making it impossible to distinguish the Project's from each other. – Bannasch 8/6, 2017 at 0:10

Down-voting since there are better answers posted below, particularly in handling multiple tags with the same name. – Serbocroatian 8/6, 2017 at 13:22

on a sidenote, if you don't need to use Python and are just trying to import the XML as a structured object for manipulation, I found that it was much easier to just use R for this as per this and this. If you just run library("XML"); result <- xmlParse(file = "file.xml"); xml_data <- xmlToList(result) you will import your XML as a nested list. Multiple tags with the same name are fine & tag attributes become an extra list item. – Bannasch 8/6, 2017 at 15:52

I used xmltodict but gives the error " parser.Parse(xml_input, True) ExpatError: syntax error: line 1, column 0", I have: import xmltodict def handle_artist(_, artist): print(artist['person']) return True xmltodict.parse('activity.xml',item_depth=2, item_callback=handle_artist) . do you know how to fix this error? – Library 13/12, 2021 at 21:33

I tried it using python 3. The result was wrong for my XML : empty list. I successfully used dictify (Erik Aronesty's solution below). – Popsicle 12/4, 2022 at 9:34

397

xmltodict (full disclosure: I wrote it) does exactly that:

xmltodict.parse("""
<?xml version="1.0" ?>
<person>
  <name>john</name>
  <age>20</age>
</person>""")
# {u'person': {u'age': u'20', u'name': u'john'}}

Meantime answered 17/4, 2012 at 21:51 Comment(19)

also, for future googlenauts - I was able to use this in App Engine, which I had been lead to believe didn't play nicely with most xml libraries in Python. – Muckworm 7/3, 2013 at 17:14

Thanks it works well. But why is there always the "u" before string ? How to vanish it ? – Graciagracie 28/3, 2013 at 9:5

The u is just indicating it's stored unicode string. It doesn't affect the value of the string in any way. – Irairacund 11/9, 2013 at 22:49

Nice. Is there a reverse (dict to xml) function or module? – Teratogenic 6/3, 2014 at 15:13

Can you please tell me how to check for a certain key without exception?? if a key doesnt exist, this OrdererdDict put error. – Chatterer 7/3, 2014 at 8:0

Nice. And yes, @ypercube, there is a xmldict.unparse() function for the reverse. – Scheffler 25/9, 2014 at 12:7

This might be obvious to some, not to others but anybody writing SOAP definitions, it's helpful to use this xmltodict module with pprint (pretty print). Just from pprint import pprint then pprint(xmltodict.parse('''your XML''')) – Pacifism 18/9, 2015 at 5:1

You might want to add import xmltodict so taht one can copy-paste it. – Sonar 14/3, 2016 at 11:9

when I try to run xmltodict.parse("file.xml") I get xml.parsers.expat.ExpatError: syntax error: line 1, column 0, any ideas what is going on? – Bannasch 8/6, 2017 at 0:26

@Bannasch I think the parse expects a string (or stream), not a filename – Horus 16/7, 2017 at 11:43

This is excellent. I'm at a loss to express how little I like XML. I wish that I had found this years ago (instead of ETree, XPATH, and that awful mess). As an aside which may help others, I didn't realize that one cannot pprint.pprint() an OrderedDict (this is the result of xmltodict.parse()). I used json.loads(json.dumps("my-XML-string-object")) to get pprint.pprint() to work. Again, THANK YOU! – Wafture 28/2, 2018 at 23:6

In some complex cases, it does not traverse the whole XML. My case was XML reply from Uniprot --but xmlschema worked though. So caveat emptor. – Flapper 14/9, 2018 at 16:0

Just a heads up, xmltodict and I think most solutions in this page do not parse external entities. – Impoverished 24/10, 2019 at 8:4

Great. This also works on QFile objects using PyQt5 – Ripp 16/4, 2020 at 13:19

Thank you very much, your module changes my life. Finally I will not have to deal anymore with XML and ElementTree – Maniac 30/4, 2020 at 12:40

Error: xml.parsers.expat.ExpatError: XML or text declaration not at start of entity It works when I remove the line return before the xml tag <?xml – Protoxylem 3/6, 2020 at 9:54

@btt, same error here: Copying your code I get XML or text declaration not at start of entity: line 2, column 0. Your Solution is here - Lesson learned: do not copy that much – Joyejoyful 24/6, 2021 at 19:43

Is this module still active in 2022? 90 issues including security ones on GitHub, no updates in about 2 years or so... – Dive 25/4, 2022 at 8:38

If you want it as a plain dictionary instead of an OrderedDict, just type cast it dict(xmltodict.parse(...) – Josefajosefina 14/4, 2023 at 10:24

This is a great module that someone created. I've used it several times. http://code.activestate.com/recipes/410469-xml-as-dictionary/

Here is the code from the website just in case the link goes bad.

from xml.etree import cElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})

Example usage:

tree = ElementTree.parse('your_file.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)

//Or, if you want to use an XML string:

root = ElementTree.XML(xml_string)
xmldict = XmlDictConfig(root)

Limiter answered 27/4, 2011 at 15:58 Comment(10)

U can use 'xmltodict' alternatively – Selfinsurance 11/5, 2015 at 15:1

I tried this and it's much faster than xmltodict. For parsing an 80MB xml file it took 7s, with xmltodict it took 90s – Desrochers 16/10, 2015 at 21:8

Down-voting since there are better answers posted below, particularly in handling multiple tags with the same name. – Serbocroatian 8/6, 2017 at 13:22

I tried it using python 3. The result was wrong for my XML : empty list. I successfully used dictify (Erik Aronesty's solution below). – Popsicle 12/4, 2022 at 9:34

The following XML-to-Python-dict snippet parses entities as well as attributes following this XML-to-JSON "specification". It is the most general solution handling all cases of XML.

from collections import defaultdict

def etree_to_dict(t):
    d = {t.tag: {} if t.attrib else None}
    children = list(t)
    if children:
        dd = defaultdict(list)
        for dc in map(etree_to_dict, children):
            for k, v in dc.items():
                dd[k].append(v)
        d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.items()}}
    if t.attrib:
        d[t.tag].update(('@' + k, v) for k, v in t.attrib.items())
    if t.text:
        text = t.text.strip()
        if children or t.attrib:
            if text:
              d[t.tag]['#text'] = text
        else:
            d[t.tag] = text
    return d

It is used:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_dict(e))

The output of this example (as per above-linked "specification") should be:

{'root': {'e': [None,
                'text',
                {'@name': 'value'},
                {'#text': 'text', '@name': 'value'},
                {'a': 'text', 'b': 'text'},
                {'a': ['text', 'text']},
                {'#text': 'text', 'a': 'text'}]}}

Not necessarily pretty, but it is unambiguous, and simpler XML inputs result in simpler JSON. :)

Update

If you want to do the reverse, emit an XML string from a JSON/dict, you can use:

try:
  basestring
except NameError:  # python3
  basestring = str

def dict_to_etree(d):
    def _to_etree(d, root):
        if not d:
            pass
        elif isinstance(d, basestring):
            root.text = d
        elif isinstance(d, dict):
            for k,v in d.items():
                assert isinstance(k, basestring)
                if k.startswith('#'):
                    assert k == '#text' and isinstance(v, basestring)
                    root.text = v
                elif k.startswith('@'):
                    assert isinstance(v, basestring)
                    root.set(k[1:], v)
                elif isinstance(v, list):
                    for e in v:
                        _to_etree(e, ET.SubElement(root, k))
                else:
                    _to_etree(v, ET.SubElement(root, k))
        else:
            raise TypeError('invalid type: ' + str(type(d)))
    assert isinstance(d, dict) and len(d) == 1
    tag, body = next(iter(d.items()))
    node = ET.Element(tag)
    _to_etree(body, node)
    return ET.tostring(node)

pprint(dict_to_etree(d))

Boote answered 9/4, 2012 at 17:23 Comment(7)

Thx for this code! Additional info: if you use python 2.5 you can't use dictionary comprehension, so you have to change the line d = {t.tag: {k:v[0] if len(v) == 1 else v for k, v in dd.iteritems()}} to d = { t.tag: dict( (k, v[0] if len(v) == 1 else v) for k, v in dd.iteritems() ) } – Leaguer 22/7, 2013 at 9:14

I have tested nearly 10 snippets / python modules / etc. for that. This one is the best I have found. According to my tests, it is : 1) much faster than github.com/martinblech/xmltodict (based on XML SAX api) 2) better than github.com/mcspring/XML2Dict which has some little issues when several children have same names 3) better than code.activestate.com/recipes/410469-xml-as-dictionary which had small issues as well and more important : 4) much shorter code than all the previous ones! Thanks @Boote – Aggregate 19/2, 2014 at 13:2

This is, by far, the most comprehensive answer, and it works on > 2.6, and its fairly flexible. my only issue is that text can change where it resides depending on whether there's an attribute or not). i posted an even smaller and more rigid solution as well. – Preadamite 18/6, 2015 at 19:25

If you need to get an ordered dict from an XML file, please, you can use this same example with few modifications (see my response below): #2148619 – Paring 29/9, 2015 at 11:13

This is also pretty nifty and fast when used with cElementTree or lxml.etree. Note that when using Python 3, all .iteritems() have to be changed to .items() (same behaviour but the keyword changed from Python 2 to 3). – Eightieth 18/4, 2016 at 12:15

Beware: high memory usage – Araxes 30/7, 2019 at 20:54

The answer that actually works perfectly! – Antipode 23/3, 2021 at 14:28

This lightweight version, while not configurable, is pretty easy to tailor as needed, and works in old pythons. Also it is rigid - meaning the results are the same regardless of the existence of attributes.

import xml.etree.ElementTree as ET

from copy import copy

def dictify(r,root=True):
    if root:
        return {r.tag : dictify(r, False)}
    d=copy(r.attrib)
    if r.text:
        d["_text"]=r.text
    for x in r.findall("./*"):
        if x.tag not in d:
            d[x.tag]=[]
        d[x.tag].append(dictify(x,False))
    return d

So:

root = ET.fromstring("<erik><a x='1'>v</a><a y='2'>w</a></erik>")

dictify(root)

Results in:

{'erik': {'a': [{'x': '1', '_text': 'v'}, {'y': '2', '_text': 'w'}]}}

Preadamite answered 18/6, 2015 at 19:19 Comment(3)

I like this solution. Simple and does not require external libs. – Arabian 12/1, 2016 at 19:33

I also like this answer since it's all in front of me (no external links). Cheers! – Riojas 4/2, 2021 at 0:29

I also like it. It gives good results for complex XML, which is not the case for class XmlListConfig above. – Popsicle 12/4, 2022 at 9:29

Disclaimer: This modified XML parser was inspired by Adam Clark The original XML parser works for most of simple cases. However, it didn't work for some complicated XML files. I debugged the code line by line and finally fixed some issues. If you find some bugs, please let me know. I am glad to fix it.

class XmlDictConfig(dict):  
    '''   
    Note: need to add a root into if no exising    
    Example usage:
    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)
    Or, if you want to use an XML string:
    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)
    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim( dict(parent_element.items()) )
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
            #   if element.items():
            #   aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():    # items() is specialy for attribtes
                elementattrib= element.items()
                if element.text:           
                    elementattrib.append((element.tag,element.text ))     # add tag:text if there exist
                self.updateShim({element.tag: dict(elementattrib)})
            else:
                self.updateShim({element.tag: element.text})

    def updateShim (self, aDict ):
        for key in aDict.keys():   # keys() includes tag and attributes
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})
                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update({key:aDict[key]})  # it was self.update(aDict)

Affirmation answered 16/9, 2016 at 20:38 Comment(0)

The most recent versions of the PicklingTools libraries (1.3.0 and 1.3.1) support tools for converting from XML to a Python dict.

The download is available here: PicklingTools 1.3.1

There is quite a bit of documentation for the converters here: the documentation describes in detail all of the decisions and issues that will arise when converting between XML and Python dictionaries (there are a number of edge cases: attributes, lists, anonymous lists, anonymous dicts, eval, etc. that most converters don't handle). In general, though, the converters are easy to use. If an 'example.xml' contains:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

Then to convert it to a dictionary:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')   # A document containing XML
>>> xl = StreamXMLLoader(example, 0)     # 0 = all defaults on operation
>>> result = xl.expect XML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

There are tools for converting in both C++ and Python: the C++ and Python do indentical conversion, but the C++ is about 60x faster

Fritter answered 23/9, 2011 at 15:52 Comment(3)

of course, then if there are 2 a's, this is not a good format. – Preadamite 18/6, 2015 at 19:21

Looks interesting, but I have not yet figured out how the PicklingTools are meant to be used - is this just a tarball of source code files from which I have to find the right ones for my job and then copy them into my project? No modules to load or anything simpler? – Eightieth 18/4, 2016 at 8:22

I get: in peekIntoNextNWSChar c = self.is.read(1) AttributeError: 'str' object has no attribute 'read' – Outsmart 27/11, 2019 at 12:39

You can do this quite easily with lxml. First install it:

[sudo] pip install lxml

Here is a recursive function I wrote that does the heavy lifting for you:

from lxml import objectify as xml_objectify


def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    return xml_to_dict_recursion(xml_objectify.fromstring(xml_str))

xml_string = """<?xml version="1.0" encoding="UTF-8"?><Response><NewOrderResp>
<IndustryType>Test</IndustryType><SomeData><SomeNestedData1>1234</SomeNestedData1>
<SomeNestedData2>3455</SomeNestedData2></SomeData></NewOrderResp></Response>"""

print xml_to_dict(xml_string)

The below variant preserves the parent key / element:

def xml_to_dict(xml_str):
    """ Convert xml to dict, using lxml v3.4.2 xml processing library, see http://lxml.de/ """
    def xml_to_dict_recursion(xml_object):
        dict_object = xml_object.__dict__
        if not dict_object:  # if empty dict returned
            return xml_object
        for key, value in dict_object.items():
            dict_object[key] = xml_to_dict_recursion(value)
        return dict_object
    xml_obj = objectify.fromstring(xml_str)
    return {xml_obj.tag: xml_to_dict_recursion(xml_obj)}

If you want to only return a subtree and convert it to dict, you can use Element.find() to get the subtree and then convert it:

xml_obj.find('.//')  # lxml.objectify.ObjectifiedElement instance

See the lxml docs here. I hope this helps!

Trothplight answered 15/7, 2015 at 19:5 Comment(0)

I wrote a simple recursive function to do the job:

from xml.etree import ElementTree
root = ElementTree.XML(xml_to_convert)

def xml_to_dict_recursive(root):

    if len(root.getchildren()) == 0:
        return {root.tag:root.text}
    else:
        return {root.tag:list(map(xml_to_dict_recursive, root.getchildren()))}

Togliatti answered 9/1, 2021 at 4:45 Comment(1)

By far the simplest solution! – Epiphyte 14/11, 2021 at 21:5

An alternative (builds a lists for the same tags in hierarchy):

from xml.etree import cElementTree as ElementTree

def xml_to_dict(xml, result):
    for child in xml:
        if len(child) == 0:
            result[child.tag] = child.text
        else:
            if child.tag in result:
                if not isinstance(result[child.tag], list):
                    result[child.tag] = [result[child.tag]]
                result[child.tag].append(xml_to_dict(child, {}))
            else:
                result[child.tag] = xml_to_dict(child, {})
    return result

xmlTree = ElementTree.parse('my_file.xml')
xmlRoot = xmlTree.getroot()
dictRoot = xml_to_dict(xmlRoot, {})
result = {xmlRoot.tag: dictRoot}

Fugazy answered 29/6, 2021 at 19:38 Comment(0)

def xml_to_dict(node):
    u''' 
    @param node:lxml_node
    @return: dict 
    '''

    return {'tag': node.tag, 'text': node.text, 'attrib': node.attrib, 'children': {child.tag: xml_to_dict(child) for child in node}}

Jerrodjerrol answered 4/4, 2013 at 13:9 Comment(0)

The code from http://code.activestate.com/recipes/410469-xml-as-dictionary/ works well, but if there are multiple elements that are the same at a given place in the hierarchy it just overrides them.

I added a shim between that looks to see if the element already exists before self.update(). If so, pops the existing entry and creates a lists out of the existing and the new. Any subsequent duplicates are added to the list.

Not sure if this can be handled more gracefully, but it works:

import xml.etree.ElementTree as ElementTree

class XmlDictConfig(dict):
    def __init__(self, parent_element):
        if parent_element.items():
            self.updateShim(dict(parent_element.items()))
        for element in parent_element:
            if len(element):
                aDict = XmlDictConfig(element)
                if element.items():
                    aDict.updateShim(dict(element.items()))
                self.updateShim({element.tag: aDict})
            elif element.items():
                self.updateShim({element.tag: dict(element.items())})
            else:
                self.updateShim({element.tag: element.text.strip()})

    def updateShim (self, aDict ):
        for key in aDict.keys():
            if key in self:
                value = self.pop(key)
                if type(value) is not list:
                    listOfDicts = []
                    listOfDicts.append(value)
                    listOfDicts.append(aDict[key])
                    self.update({key: listOfDicts})

                else:
                    value.append(aDict[key])
                    self.update({key: value})
            else:
                self.update(aDict)

Sair answered 24/7, 2014 at 6:45 Comment(0)

@dibrovsd: Solution will not work if the xml have more than one tag with same name

On your line of thought, I have modified the code a bit and written it for general node instead of root:

from collections import defaultdict
def xml2dict(node):
    d, count = defaultdict(list), 1
    for i in node:
        d[i.tag + "_" + str(count)]['text'] = i.findtext('.')[0]
        d[i.tag + "_" + str(count)]['attrib'] = i.attrib # attrib gives the list
        d[i.tag + "_" + str(count)]['children'] = xml2dict(i) # it gives dict
     return d

Midkiff answered 19/6, 2014 at 23:49 Comment(0)

From @K3---rnc response (the best for me) I've added a small modifications to get an OrderedDict from an XML text (some times order matters):

def etree_to_ordereddict(t):
d = OrderedDict()
d[t.tag] = OrderedDict() if t.attrib else None
children = list(t)
if children:
    dd = OrderedDict()
    for dc in map(etree_to_ordereddict, children):
        for k, v in dc.iteritems():
            if k not in dd:
                dd[k] = list()
            dd[k].append(v)
    d = OrderedDict()
    d[t.tag] = OrderedDict()
    for k, v in dd.iteritems():
        if len(v) == 1:
            d[t.tag][k] = v[0]
        else:
            d[t.tag][k] = v
if t.attrib:
    d[t.tag].update(('@' + k, v) for k, v in t.attrib.iteritems())
if t.text:
    text = t.text.strip()
    if children or t.attrib:
        if text:
            d[t.tag]['#text'] = text
    else:
        d[t.tag] = text
return d

Following @K3---rnc example, you can use it:

from xml.etree import cElementTree as ET
e = ET.XML('''
<root>
  <e />
  <e>text</e>
  <e name="value" />
  <e name="value">text</e>
  <e> <a>text</a> <b>text</b> </e>
  <e> <a>text</a> <a>text</a> </e>
  <e> text <a>text</a> </e>
</root>
''')

from pprint import pprint
pprint(etree_to_ordereddict(e))

Hope it helps ;)

Paring answered 29/9, 2015 at 11:11 Comment(0)

Here's a link to an ActiveState solution - and the code in case it disappears again.

==================================================
xmlreader.py:
==================================================
from xml.dom.minidom import parse


class NotTextNodeError:
    pass


def getTextFromNode(node):
    """
    scans through all children of node and gathers the
    text. if node has non-text child-nodes, then
    NotTextNodeError is raised.
    """
    t = ""
    for n in node.childNodes:
    if n.nodeType == n.TEXT_NODE:
        t += n.nodeValue
    else:
        raise NotTextNodeError
    return t


def nodeToDic(node):
    """
    nodeToDic() scans through the children of node and makes a
    dictionary from the content.
    three cases are differentiated:
    - if the node contains no other nodes, it is a text-node
    and {nodeName:text} is merged into the dictionary.
    - if the node has the attribute "method" set to "true",
    then it's children will be appended to a list and this
    list is merged to the dictionary in the form: {nodeName:list}.
    - else, nodeToDic() will call itself recursively on
    the nodes children (merging {nodeName:nodeToDic()} to
    the dictionary).
    """
    dic = {} 
    for n in node.childNodes:
    if n.nodeType != n.ELEMENT_NODE:
        continue
    if n.getAttribute("multiple") == "true":
        # node with multiple children:
        # put them in a list
        l = []
        for c in n.childNodes:
            if c.nodeType != n.ELEMENT_NODE:
            continue
        l.append(nodeToDic(c))
            dic.update({n.nodeName:l})
        continue

    try:
        text = getTextFromNode(n)
    except NotTextNodeError:
            # 'normal' node
            dic.update({n.nodeName:nodeToDic(n)})
            continue

        # text node
        dic.update({n.nodeName:text})
    continue
    return dic


def readConfig(filename):
    dom = parse(filename)
    return nodeToDic(dom)





def test():
    dic = readConfig("sample.xml")

    print dic["Config"]["Name"]
    print
    for item in dic["Config"]["Items"]:
    print "Item's Name:", item["Name"]
    print "Item's Value:", item["Value"]

test()



==================================================
sample.xml:
==================================================
<?xml version="1.0" encoding="UTF-8"?>

<Config>
    <Name>My Config File</Name>

    <Items multiple="true">
    <Item>
        <Name>First Item</Name>
        <Value>Value 1</Value>
    </Item>
    <Item>
        <Name>Second Item</Name>
        <Value>Value 2</Value>
    </Item>
    </Items>

</Config>



==================================================
output:
==================================================
My Config File

Item's Name: First Item
Item's Value: Value 1
Item's Name: Second Item
Item's Value: Value 2

Apostate answered 27/1, 2010 at 15:35 Comment(1)

Yes it is. Have reproduced the code here in case it goes again. – Bette 22/7, 2013 at 9:5

Updated method posted by firelion.cis (since getchildren is deprecated):

from xml.etree import ElementTree
root = ElementTree.XML(xml_to_convert)

def xml_to_dict_recursive(root):

    if len(list(root)) == 0:
        return {root.tag:root.text}
    else:
        return {root.tag:list(map(xml_to_dict_recursive, list(root)))}

Mulvey answered 11/6, 2022 at 8:48 Comment(0)

At one point I had to parse and write XML that only consisted of elements without attributes so a 1:1 mapping from XML to dict was possible easily. This is what I came up with in case someone else also doesnt need attributes:

def xmltodict(element):
    if not isinstance(element, ElementTree.Element):
        raise ValueError("must pass xml.etree.ElementTree.Element object")

    def xmltodict_handler(parent_element):
        result = dict()
        for element in parent_element:
            if len(element):
                obj = xmltodict_handler(element)
            else:
                obj = element.text

            if result.get(element.tag):
                if hasattr(result[element.tag], "append"):
                    result[element.tag].append(obj)
                else:
                    result[element.tag] = [result[element.tag], obj]
            else:
                result[element.tag] = obj
        return result

    return {element.tag: xmltodict_handler(element)}


def dicttoxml(element):
    if not isinstance(element, dict):
        raise ValueError("must pass dict type")
    if len(element) != 1:
        raise ValueError("dict must have exactly one root key")

    def dicttoxml_handler(result, key, value):
        if isinstance(value, list):
            for e in value:
                dicttoxml_handler(result, key, e)
        elif isinstance(value, basestring):
            elem = ElementTree.Element(key)
            elem.text = value
            result.append(elem)
        elif isinstance(value, int) or isinstance(value, float):
            elem = ElementTree.Element(key)
            elem.text = str(value)
            result.append(elem)
        elif value is None:
            result.append(ElementTree.Element(key))
        else:
            res = ElementTree.Element(key)
            for k, v in value.items():
                dicttoxml_handler(res, k, v)
            result.append(res)

    result = ElementTree.Element(element.keys()[0])
    for key, value in element[element.keys()[0]].items():
        dicttoxml_handler(result, key, value)
    return result

def xmlfiletodict(filename):
    return xmltodict(ElementTree.parse(filename).getroot())

def dicttoxmlfile(element, filename):
    ElementTree.ElementTree(dicttoxml(element)).write(filename)

def xmlstringtodict(xmlstring):
    return xmltodict(ElementTree.fromstring(xmlstring).getroot())

def dicttoxmlstring(element):
    return ElementTree.tostring(dicttoxml(element))

Roosevelt answered 4/3, 2012 at 19:53 Comment(0)

I have modified one of the answers to my taste and to work with multiple values with the same tag for example consider the following xml code saved in XML.xml file

     <A>
        <B>
            <BB>inAB</BB>
            <C>
                <D>
                    <E>
                        inABCDE
                    </E>
                    <E>value2</E>
                    <E>value3</E>
                </D>
                <inCout-ofD>123</inCout-ofD>
            </C>
        </B>
        <B>abc</B>
        <F>F</F>
    </A>

and in python

import xml.etree.ElementTree as ET




class XMLToDictionary(dict):
    def __init__(self, parentElement):
        self.parentElement = parentElement
        for child in list(parentElement):
            child.text = child.text if (child.text != None) else  ' '
            if len(child) == 0:
                self.update(self._addToDict(key= child.tag, value = child.text.strip(), dict = self))
            else:
                innerChild = XMLToDictionary(parentElement=child)
                self.update(self._addToDict(key=innerChild.parentElement.tag, value=innerChild, dict=self))

    def getDict(self):
        return {self.parentElement.tag: self}

    class _addToDict(dict):
        def __init__(self, key, value, dict):
            if not key in dict:
                self.update({key: value})
            else:
                identical = dict[key] if type(dict[key]) == list else [dict[key]]
                self.update({key: identical + [value]})


tree = ET.parse('./XML.xml')
root = tree.getroot()
parseredDict = XMLToDictionary(root).getDict()
print(parseredDict)

the output is

{'A': {'B': [{'BB': 'inAB', 'C': {'D': {'E': ['inABCDE', 'value2', 'value3']}, 'inCout-ofD': '123'}}, 'abc'], 'F': 'F'}}

Albaalbacete answered 19/5, 2019 at 17:2 Comment(0)

import xml.etree.ElementTree as ET
root = ET.parse(xml_filepath).getroot()

def parse_xml(node):
    ans = {}
    for child in node:
        if len(child) == 0:
            ans[child.tag] = child.text
        elif child.tag not in ans:
            ans[child.tag] = parse_xml(child)
        elif not isinstance(ans[child.tag], list):
            ans[child.tag] = [ans[child.tag]]
            ans[child.tag].append(parse_xml(child))
        else:
            ans[child.tag].append(parse_xml(child))
    return ans

it merges same field into list and squeezes fields containing one child.

Likeness answered 9/8, 2022 at 8:33 Comment(0)

-1

Slightly improved version of fvg's fix of firelion.cis's answer. The function is simple, and works for simple XML, and avoids the innermost singleton dictionaries. NOT suitable for complex XML with tags, or if the XML have more than one tag with same name.

from xml.etree import ElementTree

# Replace xml_to_convert below
root = ElementTree.XML(xml_to_convert)

def xml_to_dict(root):
    if len(root):
        return {root.tag:{k:v for d in map(xml_to_dict, root)
                              for k,v in d.items() }}
    else:
        return {root.tag:root.text}

Sample XML:

<student>
    <FirstName>SMITH</FirstName>
    <LastName>JAMES</LastName>
    <fees>
        <Amount>2400</Amount>
        <Currency>USD</Currency>
    </fees>
</student>

Output (formatted):

{'student': {'FirstName': 'SMITH',
             'LastName': 'JAMES',
             'fees': {'Amount': '2400', 
                      'Currency': 'USD'}
            }
}

Daile answered 29/3, 2023 at 13:39 Comment(0)

-3

I have a recursive method to get a dictionary from a lxml element

    def recursive_dict(element):
        return (element.tag.split('}')[1],
                dict(map(recursive_dict, element.getchildren()),
                     **element.attrib))

Curio answered 23/11, 2016 at 2:1 Comment(1)

This solution is missing some code, such as import and set up. I got the message 'str' object has no attribute 'tag' – Salyer 24/1, 2017 at 16:36

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Update

Recommended topics

Hot tags