Preserve order of attributes when modifying with minidom
Asked Answered
T

9

13

Is there a way I can preserve the original order of attributes when processing XML with minidom?

Say I have: <color red="255" green="255" blue="233" /> when I modify this with minidom the attributes are rearranged alphabetically blue, green, and red. I'd like to preserve the original order.

I am processing the file by looping through the elements returned by elements = doc.getElementsByTagName('color') and then I do assignments like this e.attributes["red"].value = "233".

Tryparsamide answered 19/3, 2009 at 15:23 Comment(0)
I
9

Is there a way I can preserve the original order of attributes when processing XML with minidom?

With minidom no, the datatype used to store attributes is an unordered dictionary. pxdom can do it, though it is considerably slower.

Indifferent answered 19/3, 2009 at 15:42 Comment(0)
S
11

To keep the attribute order I made this slight modification in minidom:

from collections import OrderedDict

In the Element class :

__init__(...)
    self._attrs = OrderedDict()
    #self._attrs = {}
writexml(...)
    #a_names.sort()

Now this will only work with Python 2.7+ And I'm not sure if it actually works => Use at your own risks...

And please note that you should not rely on attribute order:

Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.

Sacchariferous answered 1/12, 2011 at 17:0 Comment(2)
How did you modify the Element class?Gosselin
Still works on Python 3.2, replace a_names = sorted(attrs.keys()) by a_names = attrs.keys()Literalism
I
9

Is there a way I can preserve the original order of attributes when processing XML with minidom?

With minidom no, the datatype used to store attributes is an unordered dictionary. pxdom can do it, though it is considerably slower.

Indifferent answered 19/3, 2009 at 15:42 Comment(0)
H
3

It is clear that xml attribute are not ordered. I just have found this strange behavior !

It seems that this related to a sort added in xml.dom.minidom.Element.writexml function !!

class Element(Node):
... snip ...

    def writexml(self, writer, indent="", addindent="", newl=""):
        # indent = current indentation
        # addindent = indentation to add to higher levels
        # newl = newline string
        writer.write(indent+"<" + self.tagName)

        attrs = self._get_attributes()
        a_names = attrs.keys()
        a_names.sort()
--------^^^^^^^^^^^^^^
        for a_name in a_names:
            writer.write(" %s=\"" % a_name)
            _write_data(writer, attrs[a_name].value)
            writer.write("\"")

Removing the line restore a behavior which keep the order of the original document. It is a good idea when you have to check with diff tools that there is not a mistake in your code.

Hurtless answered 6/5, 2011 at 8:36 Comment(0)
M
3

Before Python 2.7, I used following hotpatching:

class _MinidomHooker(object):
    def __enter__(self):
        minidom.NamedNodeMap.keys_orig = minidom.NamedNodeMap.keys
        minidom.NamedNodeMap.keys = self._NamedNodeMap_keys_hook
        return self

    def __exit__(self, *args):
        minidom.NamedNodeMap.keys = minidom.NamedNodeMap.keys_orig
        del minidom.NamedNodeMap.keys_orig

    @staticmethod
    def _NamedNodeMap_keys_hook(node_map):
        class OrderPreservingList(list):
            def sort(self):
                pass
        return OrderPreservingList(node_map.keys_orig())

Used this way:

with _MinidomHooker():
    document.writexml(...)

Disclaimer:

  1. thou shall not rely on the order of attributes.
  2. mutating the NamedNodeMap class is not thread safe.
  3. hotpatching is evil.
Milliner answered 8/12, 2011 at 9:47 Comment(0)
A
3

You guys can put up as many disclaimers you want. While reordering the attributes has no meaning for the program it does have a meaning for the programmer/user.

For Fredrick it was important to have the RGB order since that is how the order of the colors is. For me it is the name attribute in particular.

Compare

<field name="url" type="string" indexed="true" stored="true" required="true" multiValued="false"/> <!-- ID -->
<field name="forkortelse" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="kortform" type="text_general" indexed="true" stored="true" required="false" multiValued="false" />
<field name="dato" type="date" indexed="true" stored="true" required="false" multiValued="false" />
<field name="nummer" type="int" indexed="true" stored="true" required="false" multiValued="false" />
<field name="kilde" type="string" indexed="true" stored="true" required="false" multiValued="false" />
<field name="tittel" type="text_general" indexed="true" stored="true" multiValued="true"/>

Against

<field indexed="true" multiValued="false" name="forkortelse" required="false" stored="true" type="string"/>
<field indexed="true" multiValued="false" name="kortform" required="false" stored="true" type="text_general"/>
<field indexed="true" multiValued="false" name="dato" required="false" stored="true" type="date"/>
<field indexed="true" multiValued="false" name="nummer" required="false" stored="true" type="int"/>
<field indexed="true" multiValued="false" name="kilde" required="false" stored="true" type="string"/>
<field an_optional_attr="OMG!" an_optional_attr2="OMG!!" indexed="true" name="tittel" stored="true" type="text_general"/>

While it is not impossible to read it is not as easy. The name is the important attribute. Hiding the name field way back is no good. What if the name was 15 attributes to the left where 7 of the attributes in front was optional?

The point is that the reordering is a bigger problem than what the acsending ordering gives in return. It messes with the way the programmer thinks or how the functionality is supposed to work. At least the ordering should be configurable/optional.

Excuse my poor english. It is not my main language.

Alexalexa answered 10/3, 2014 at 15:2 Comment(3)
What you are saying here is not unreasonable. But it is not an answer to the question.Bocanegra
i don't understandMacfadyn
I totally agree with what you are saying, but this should really be a comment, even though it is too large for one.Procuration
O
1

1.Custom your own 'Element.writexml' method.

from 'minidom.py' copy Element's writexml code to your own file.

rename it to writexml_nosort,

delete 'a_names.sort()' (python 2.7) or change 'a_names = sorted(attrs.keys())' to 'a_names = attrs.keys()'(python 3.4)

change the Element's method to your own:

minidom.Element.writexml = writexml_nosort;

2.custom your favorite order:

right_order = ['a', 'b', 'c', 'a1', 'b1']

3.adjust your element 's _attrs

node._attrs = OrderedDict( [(k,node._attrs[k]) for k in right_order ] )

Outlive answered 17/4, 2015 at 10:37 Comment(0)
H
0

The attributes are ordered in minidom while writing with writexlm function in class Element. It is done like this:

a-name = sorted(attrs.keys())

You can change this to

a-name = list(attrs.keys())

For Idle I had to change the file in /usr/lib/python3.6/xml/dom. It seems that Idle does not follow the sys.path order. Don't forget to make a back-up first.

Hbeam answered 16/4, 2020 at 13:2 Comment(0)
B
0

Is there a way I can preserve the original order of attributes when processing XML with minidom?

Yes. From Python 3.8, the original attribute order is preserved when serializing the XML document.

See https://docs.python.org/3/library/xml.dom.minidom.html#xml.dom.minidom.Node.writexml.

Bocanegra answered 17/4, 2020 at 8:51 Comment(0)
T
-1

I've ended up using the lxml library instead of minidom.

Tryparsamide answered 13/6, 2009 at 0:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.