In Python, how can you load YAML mappings as OrderedDicts?
Asked Answered
R

8

151

I'd like to get PyYAML's loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict and the list of pairs it currently uses.

What's the best way to do that?

Roselynroseman answered 25/2, 2011 at 19:52 Comment(0)
R
15

Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader

I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.

import yaml
import yaml.constructor

try:
    # included in standard lib from Python 2.7
    from collections import OrderedDict
except ImportError:
    # try importing the backported drop-in replacement
    # it's available on PyPI
    from ordereddict import OrderedDict

class OrderedDictYAMLLoader(yaml.Loader):
    """
    A YAML loader that loads mappings into ordered dictionaries.
    """

    def __init__(self, *args, **kwargs):
        yaml.Loader.__init__(self, *args, **kwargs)

        self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
        self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)

    def construct_yaml_map(self, node):
        data = OrderedDict()
        yield data
        value = self.construct_mapping(node)
        data.update(value)

    def construct_mapping(self, node, deep=False):
        if isinstance(node, yaml.MappingNode):
            self.flatten_mapping(node)
        else:
            raise yaml.constructor.ConstructorError(None, None,
                'expected a mapping node, but found %s' % node.id, node.start_mark)

        mapping = OrderedDict()
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            try:
                hash(key)
            except TypeError, exc:
                raise yaml.constructor.ConstructorError('while constructing a mapping',
                    node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
            value = self.construct_object(value_node, deep=deep)
            mapping[key] = value
        return mapping
Roselynroseman answered 25/2, 2011 at 19:55 Comment(4)
If you want to include the key_node.start_mark attribute in your error message, I don't see any obvious way to simplify your central construction loop. If you try to make use of the fact that the OrderedDict constructor will accept an iterable of key, value pairs, you lose access to that detail when generating the error message.Transverse
has anyone tested this code properly? I can not get it to work in my application!Fact
Example Usage: ordered_dict = yaml.load( ''' b: 1 a: 2 ''', Loader=OrderedDictYAMLLoader) # ordered_dict = OrderedDict([('b', 1), ('a', 2)]) Unfortunately my edit to the post was rejected, so please excuse lack of formatting.Anabas
This implementation breaks loading of ordered mapping types. To fix this, you can just remove the second call to add_constructor in your __init__ method.Looper
D
189

Python >= 3.6

In python 3.6+, it seems that dict loading order is preserved by default without special dictionary types. The default Dumper, on the other hand, sorts dictionaries by key. Starting with pyyaml 5.1, you can turn this off by passing sort_keys=False:

a = dict(zip("unsorted", "unsorted"))
s = yaml.safe_dump(a, sort_keys=False)
b = yaml.safe_load(s)

assert list(a.keys()) == list(b.keys())  # True

This can work due to the new dict implementation that has been in use in pypy for some time. While still considered an implementation detail in CPython 3.6, "the insertion-order preserving nature of dicts has been declared an official part of the Python language spec" as of 3.7+, see What's New In Python 3.7.

Note that this is still undocumented from PyYAML side, so you shouldn't rely on this for safety critical applications.

Original answer (compatible with all known versions)

I like @James' solution for its simplicity. However, it changes the default global yaml.Loader class, which can lead to troublesome side effects. Especially, when writing library code this is a bad idea. Also, it doesn't directly work with yaml.safe_load().

Fortunately, the solution can be improved without much effort:

import yaml
from collections import OrderedDict

def ordered_load(stream, Loader=yaml.SafeLoader, object_pairs_hook=OrderedDict):
    class OrderedLoader(Loader):
        pass
    def construct_mapping(loader, node):
        loader.flatten_mapping(node)
        return object_pairs_hook(loader.construct_pairs(node))
    OrderedLoader.add_constructor(
        yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
        construct_mapping)
    return yaml.load(stream, OrderedLoader)

# usage example:
ordered_load(stream, yaml.SafeLoader)

For serialization, you could use the following funcion:

def ordered_dump(data, stream=None, Dumper=yaml.SafeDumper, **kwds):
    class OrderedDumper(Dumper):
        pass
    def _dict_representer(dumper, data):
        return dumper.represent_mapping(
            yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
            data.items())
    OrderedDumper.add_representer(OrderedDict, _dict_representer)
    return yaml.dump(data, stream, OrderedDumper, **kwds)

# usage:
ordered_dump(data, Dumper=yaml.SafeDumper)

In each case, you could also make the custom subclasses global, so that they don't have to be recreated on each call.

Detachment answered 20/2, 2014 at 15:47 Comment(11)
This implementation breaks YAML merge tags, BTWReveille
@Reveille Thanks. I didn't run in that scenario before, but now I added a fix to handle this as well (I hope).Detachment
This would have saved me horrible hacks in a yaml based charactersheet I wrote a handful of years ago. Maybe it’s time to revisit that. I hope something like this goes upstream eventually!Inspired
@ArneBabenhauserheide I am not sure if PyPI is upstream enough, but take a look at ruamel.yaml (I am the author of that) if you think it does.Basilisk
@coldfix, the ordered_dump() isn't working for me. The simple items are coming out properly, but the nested dictionaries are not. For example: swagger: '2.0' info: description: My API version: v1 title: My API contact: {name: Company, url: 'api.company.com', email: [email protected]} Any ideas why this might be? Thanks.Therapist
@MartinDelVecchio what doesn't work exactly? If you don't like the formatting, try passing default_flow_style=False as keyword argument.Detachment
I figured that out, but StackOverflow wouldn't let me edit my comment too many times. Without default_flow_style=False, the YAML syntax was incorrect. With it, it is correct. Thanks!Therapist
@MartinDelVecchio It's still correct YAML syntax without, just less pretty.Detachment
To achieve yaml.safe_load, just make loader inherit from SafeLoader def ordered_load(stream, Loader=yaml.SafeLoader, object_pairs_hook=OrderedDict): In PyYAML 4.1 and newer, the yaml.load() API will act like yaml.safe_load()Leonhard
When used with a file that contains jinja templates, this results in unhashable type: collections.OrderedDict. I presume that the custom loader generates and OrderedDict, which it then attempts to process again, but can't, because it's not hashable.Bretbretagne
Perhaps something has changed in the yaml upstream code, but the ordered loader no longer works. The loaded data is definitely being sorted.Bretbretagne
I
61

oyaml is a drop-in replacement for PyYAML which preserves dict ordering. Both Python 2 and Python 3 are supported. Just pip install oyaml, and import as shown below:

import oyaml as yaml

You'll no longer be annoyed by screwed-up mappings when dumping/loading.

Note: I'm the author of oyaml.

Ioved answered 21/2, 2018 at 8:6 Comment(1)
Thank you for this! For some reason, even with Python 3.8 the order was not respected with PyYaml. oyaml solved this for me immediately.Sorel
B
57

The yaml module allow you to specify custom 'representers' to convert Python objects to text and 'constructors' to reverse the process.

_mapping_tag = yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG

def dict_representer(dumper, data):
    return dumper.represent_dict(data.iteritems())

def dict_constructor(loader, node):
    return collections.OrderedDict(loader.construct_pairs(node))

yaml.add_representer(collections.OrderedDict, dict_representer)
yaml.add_constructor(_mapping_tag, dict_constructor)
Backbencher answered 10/1, 2014 at 15:26 Comment(6)
any explanations for this answer?Propertius
Or even better from six import iteritems and then change it to iteritems(data) so that it works equally well in Python 2 & 3.Directional
This seems to be using undocumented features of PyYAML (represent_dict and DEFAULT_MAPPING_TAG). Is this because the documentation is incomplete, or are these features unsupported and subject to change without notice?Vespid
Note that for dict_constructor you'll need to call loader.flatten_mapping(node) or you won't be able to load <<: *... (merge syntax)Abhor
@brice-m-dempsey can you add any example how to use your code? It does not seem to work in my case (Python 3.7)Crossman
–1 This breaks yaml module. See yaml.org/type/merge.html for an example of valid markup which subsequently fails to load.Ioved
B
29

2015 (and later) option:

ruamel.yaml is a drop in replacement for PyYAML (disclaimer: I am the author of that package). Preserving the order of the mappings was one of the things added in the first version (0.1) back in 2015. Not only does it preserve the order of your dictionaries, it will also preserve comments, anchor names, tags and does support the YAML 1.2 specification (released 2009)

The specification says that the ordering is not guaranteed, but of course there is ordering in the YAML file and the appropriate parser can just hold on to that and transparently generate an object that keeps the ordering. You just need to choose the right parser, loader and dumper¹:

import sys
from ruamel.yaml import YAML

yaml_str = """\
3: abc
conf:
    10: def
    3: gij     # h is missing
more:
- what
- else
"""

yaml = YAML()
data = yaml.load(yaml_str)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout)

will give you:

3: abc
conf:
  10: klm
  3: jig       # h is missing
more:
- what
- else

data is of type CommentedMap which functions like a dict, but has extra information that is kept around until being dumped (including the preserved comment!)

Basilisk answered 10/6, 2015 at 18:2 Comment(6)
That's pretty nice if you already have a YAML file, but how do you do that using a Python structure? I tried using CommentedMap directly but it does not work, and OrderedDict puts !!omap everywhere which is not very user-friendly.Shakta
I am not sure why CommentedMap did not work for you. Can you post a question with your (minimalized) code and tag it ruamel.yaml? That way I will be notified and answer.Basilisk
Sorry, I think it's because I tried to save the CommentedMap with safe=True in YAML, which did not work (using safe=False works). I also had issue with CommentedMap not being modifiable, but I cannot reproduce it now... I'll open a new question if I encounter this issue again.Shakta
You should be using yaml = YAML(), you get the round-trip parser/dumper and that is derivative of the safe parser/dumper that knows about CommentedMap/Seq etc.Basilisk
In fact it is possible to preserve key order (but obviously not comments) in safe mode too! Say if I need to dump a plain dict to .yaml and keep the key order then yaml = YAML(typ='safe', pure=True); yaml.sort_base_mapping_type_on_output = False; will do the trick. However, setting of sort_base_mapping_type_on_output should be done immediately after yaml creation or at least before any dumping, otherwise it is not propagated to the representer. Still you can always do yaml.representer.sort_base_mapping_type_on_output = False.Gospodin
@Gospodin That is a side effect of you using a more modern version of Python than was current when this answer was given. The underlying dict() in Python preserves order nowadays, but it didn't use to.Basilisk
R
15

Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader

I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.

import yaml
import yaml.constructor

try:
    # included in standard lib from Python 2.7
    from collections import OrderedDict
except ImportError:
    # try importing the backported drop-in replacement
    # it's available on PyPI
    from ordereddict import OrderedDict

class OrderedDictYAMLLoader(yaml.Loader):
    """
    A YAML loader that loads mappings into ordered dictionaries.
    """

    def __init__(self, *args, **kwargs):
        yaml.Loader.__init__(self, *args, **kwargs)

        self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
        self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)

    def construct_yaml_map(self, node):
        data = OrderedDict()
        yield data
        value = self.construct_mapping(node)
        data.update(value)

    def construct_mapping(self, node, deep=False):
        if isinstance(node, yaml.MappingNode):
            self.flatten_mapping(node)
        else:
            raise yaml.constructor.ConstructorError(None, None,
                'expected a mapping node, but found %s' % node.id, node.start_mark)

        mapping = OrderedDict()
        for key_node, value_node in node.value:
            key = self.construct_object(key_node, deep=deep)
            try:
                hash(key)
            except TypeError, exc:
                raise yaml.constructor.ConstructorError('while constructing a mapping',
                    node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
            value = self.construct_object(value_node, deep=deep)
            mapping[key] = value
        return mapping
Roselynroseman answered 25/2, 2011 at 19:55 Comment(4)
If you want to include the key_node.start_mark attribute in your error message, I don't see any obvious way to simplify your central construction loop. If you try to make use of the fact that the OrderedDict constructor will accept an iterable of key, value pairs, you lose access to that detail when generating the error message.Transverse
has anyone tested this code properly? I can not get it to work in my application!Fact
Example Usage: ordered_dict = yaml.load( ''' b: 1 a: 2 ''', Loader=OrderedDictYAMLLoader) # ordered_dict = OrderedDict([('b', 1), ('a', 2)]) Unfortunately my edit to the post was rejected, so please excuse lack of formatting.Anabas
This implementation breaks loading of ordered mapping types. To fix this, you can just remove the second call to add_constructor in your __init__ method.Looper
N
11

Update: the library was deprecated in favor of the yamlloader (which is based on the yamlordereddictloader)

I've just found a Python library (https://pypi.python.org/pypi/yamlordereddictloader/0.1.1) which was created based on answers to this question and is quite simple to use:

import yaml
import yamlordereddictloader

datas = yaml.load(open('myfile.yml'), Loader=yamlordereddictloader.Loader)
Nard answered 20/2, 2016 at 13:26 Comment(1)
I don't know if tis' the same author or not, but check out yodl on github.Compaction
D
3

On my For PyYaml installation for Python 2.7 I updated __init__.py, constructor.py, and loader.py. Now supports object_pairs_hook option for load commands. Diff of changes I made is below.

__init__.py

$ diff __init__.py Original
64c64
< def load(stream, Loader=Loader, **kwds):
---
> def load(stream, Loader=Loader):
69c69
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)
75c75
< def load_all(stream, Loader=Loader, **kwds):
---
> def load_all(stream, Loader=Loader):
80c80
<     loader = Loader(stream, **kwds)
---
>     loader = Loader(stream)

constructor.py

$ diff constructor.py Original
20,21c20
<     def __init__(self, object_pairs_hook=dict):
<         self.object_pairs_hook = object_pairs_hook
---
>     def __init__(self):
27,29d25
<     def create_object_hook(self):
<         return self.object_pairs_hook()
<
54,55c50,51
<         self.constructed_objects = self.create_object_hook()
<         self.recursive_objects = self.create_object_hook()
---
>         self.constructed_objects = {}
>         self.recursive_objects = {}
129c125
<         mapping = self.create_object_hook()
---
>         mapping = {}
400c396
<         data = self.create_object_hook()
---
>         data = {}
595c591
<             dictitems = self.create_object_hook()
---
>             dictitems = {}
602c598
<             dictitems = value.get('dictitems', self.create_object_hook())
---
>             dictitems = value.get('dictitems', {})

loader.py

$ diff loader.py Original
13c13
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
18c18
<         BaseConstructor.__init__(self, **constructKwds)
---
>         BaseConstructor.__init__(self)
23c23
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
28c28
<         SafeConstructor.__init__(self, **constructKwds)
---
>         SafeConstructor.__init__(self)
33c33
<     def __init__(self, stream, **constructKwds):
---
>     def __init__(self, stream):
38c38
<         Constructor.__init__(self, **constructKwds)
---
>         Constructor.__init__(self)
Dissolvent answered 25/8, 2013 at 21:48 Comment(3)
This should be added upstream actually.Landreth
Justed filed a pull request with your changes. github.com/yaml/pyyaml/pull/12 Let's hope for a merge.Landreth
Really wish the author was more active, the last commit was 4 years ago. This change would be a godsend to me.Smattering
H
-1

here's a simple solution that also checks for duplicated top level keys in your map.

import yaml
import re
from collections import OrderedDict

def yaml_load_od(fname):
    "load a yaml file as an OrderedDict"
    # detects any duped keys (fail on this) and preserves order of top level keys
    with open(fname, 'r') as f:
        lines = open(fname, "r").read().splitlines()
        top_keys = []
        duped_keys = []
        for line in lines:
            m = re.search(r'^([A-Za-z0-9_]+) *:', line)
            if m:
                if m.group(1) in top_keys:
                    duped_keys.append(m.group(1))
                else:
                    top_keys.append(m.group(1))
        if duped_keys:
            raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))
    # 2nd pass to set up the OrderedDict
    with open(fname, 'r') as f:
        d_tmp = yaml.load(f)
    return OrderedDict([(key, d_tmp[key]) for key in top_keys])
Huan answered 6/7, 2015 at 16:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.