I'd like to get PyYAML's loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict
and the list of pairs it currently uses.
What's the best way to do that?
I'd like to get PyYAML's loader to load mappings (and ordered mappings) into the Python 2.7+ OrderedDict type, instead of the vanilla dict
and the list of pairs it currently uses.
What's the best way to do that?
Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader
I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.
import yaml
import yaml.constructor
try:
# included in standard lib from Python 2.7
from collections import OrderedDict
except ImportError:
# try importing the backported drop-in replacement
# it's available on PyPI
from ordereddict import OrderedDict
class OrderedDictYAMLLoader(yaml.Loader):
"""
A YAML loader that loads mappings into ordered dictionaries.
"""
def __init__(self, *args, **kwargs):
yaml.Loader.__init__(self, *args, **kwargs)
self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)
def construct_yaml_map(self, node):
data = OrderedDict()
yield data
value = self.construct_mapping(node)
data.update(value)
def construct_mapping(self, node, deep=False):
if isinstance(node, yaml.MappingNode):
self.flatten_mapping(node)
else:
raise yaml.constructor.ConstructorError(None, None,
'expected a mapping node, but found %s' % node.id, node.start_mark)
mapping = OrderedDict()
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
try:
hash(key)
except TypeError, exc:
raise yaml.constructor.ConstructorError('while constructing a mapping',
node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
value = self.construct_object(value_node, deep=deep)
mapping[key] = value
return mapping
add_constructor
in your __init__
method. –
Looper In python 3.6+, it seems that dict loading order is preserved by default without special dictionary types. The default Dumper, on the other hand, sorts dictionaries by key. Starting with pyyaml 5.1
, you can turn this off by passing sort_keys=False
:
a = dict(zip("unsorted", "unsorted"))
s = yaml.safe_dump(a, sort_keys=False)
b = yaml.safe_load(s)
assert list(a.keys()) == list(b.keys()) # True
This can work due to the new dict implementation that has been in use in pypy for some time. While still considered an implementation detail in CPython 3.6, "the insertion-order preserving nature of dicts has been declared an official part of the Python language spec" as of 3.7+, see What's New In Python 3.7.
Note that this is still undocumented from PyYAML side, so you shouldn't rely on this for safety critical applications.
I like @James' solution for its simplicity. However, it changes the default global yaml.Loader
class, which can lead to troublesome side effects. Especially, when writing library code this is a bad idea. Also, it doesn't directly work with yaml.safe_load()
.
Fortunately, the solution can be improved without much effort:
import yaml
from collections import OrderedDict
def ordered_load(stream, Loader=yaml.SafeLoader, object_pairs_hook=OrderedDict):
class OrderedLoader(Loader):
pass
def construct_mapping(loader, node):
loader.flatten_mapping(node)
return object_pairs_hook(loader.construct_pairs(node))
OrderedLoader.add_constructor(
yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
construct_mapping)
return yaml.load(stream, OrderedLoader)
# usage example:
ordered_load(stream, yaml.SafeLoader)
For serialization, you could use the following funcion:
def ordered_dump(data, stream=None, Dumper=yaml.SafeDumper, **kwds):
class OrderedDumper(Dumper):
pass
def _dict_representer(dumper, data):
return dumper.represent_mapping(
yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG,
data.items())
OrderedDumper.add_representer(OrderedDict, _dict_representer)
return yaml.dump(data, stream, OrderedDumper, **kwds)
# usage:
ordered_dump(data, Dumper=yaml.SafeDumper)
In each case, you could also make the custom subclasses global, so that they don't have to be recreated on each call.
default_flow_style=False
as keyword argument. –
Detachment def ordered_load(stream, Loader=yaml.SafeLoader, object_pairs_hook=OrderedDict):
In PyYAML 4.1 and newer, the yaml.load()
API will act like yaml.safe_load()
–
Leonhard jinja
templates, this results in unhashable type: collections.OrderedDict
. I presume that the custom loader generates and OrderedDict
, which it then attempts to process again, but can't, because it's not hashable. –
Bretbretagne yaml
upstream code, but the ordered loader no longer works. The loaded data is definitely being sorted. –
Bretbretagne oyaml
is a drop-in replacement for PyYAML which preserves dict ordering. Both Python 2 and Python 3 are supported. Just pip install oyaml
, and import as shown below:
import oyaml as yaml
You'll no longer be annoyed by screwed-up mappings when dumping/loading.
Note: I'm the author of oyaml.
The yaml module allow you to specify custom 'representers' to convert Python objects to text and 'constructors' to reverse the process.
_mapping_tag = yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG
def dict_representer(dumper, data):
return dumper.represent_dict(data.iteritems())
def dict_constructor(loader, node):
return collections.OrderedDict(loader.construct_pairs(node))
yaml.add_representer(collections.OrderedDict, dict_representer)
yaml.add_constructor(_mapping_tag, dict_constructor)
from six import iteritems
and then change it to iteritems(data)
so that it works equally well in Python 2 & 3. –
Directional represent_dict
and DEFAULT_MAPPING_TAG
). Is this because the documentation is incomplete, or are these features unsupported and subject to change without notice? –
Vespid dict_constructor
you'll need to call loader.flatten_mapping(node)
or you won't be able to load <<: *...
(merge syntax) –
Abhor yaml
module. See yaml.org/type/merge.html for an example of valid markup which subsequently fails to load. –
Ioved ruamel.yaml is a drop in replacement for PyYAML (disclaimer: I am the author of that package). Preserving the order of the mappings was one of the things added in the first version (0.1) back in 2015. Not only does it preserve the order of your dictionaries, it will also preserve comments, anchor names, tags and does support the YAML 1.2 specification (released 2009)
The specification says that the ordering is not guaranteed, but of course there is ordering in the YAML file and the appropriate parser can just hold on to that and transparently generate an object that keeps the ordering. You just need to choose the right parser, loader and dumper¹:
import sys
from ruamel.yaml import YAML
yaml_str = """\
3: abc
conf:
10: def
3: gij # h is missing
more:
- what
- else
"""
yaml = YAML()
data = yaml.load(yaml_str)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout)
will give you:
3: abc
conf:
10: klm
3: jig # h is missing
more:
- what
- else
data
is of type CommentedMap
which functions like a dict, but has extra information that is kept around until being dumped (including the preserved comment!)
CommentedMap
directly but it does not work, and OrderedDict
puts !!omap
everywhere which is not very user-friendly. –
Shakta CommentedMap
with safe=True
in YAML
, which did not work (using safe=False
works). I also had issue with CommentedMap
not being modifiable, but I cannot reproduce it now... I'll open a new question if I encounter this issue again. –
Shakta yaml = YAML()
, you get the round-trip parser/dumper and that is derivative of the safe parser/dumper that knows about CommentedMap/Seq etc. –
Basilisk dict()
in Python preserves order nowadays, but it didn't use to. –
Basilisk Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader
I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.
import yaml
import yaml.constructor
try:
# included in standard lib from Python 2.7
from collections import OrderedDict
except ImportError:
# try importing the backported drop-in replacement
# it's available on PyPI
from ordereddict import OrderedDict
class OrderedDictYAMLLoader(yaml.Loader):
"""
A YAML loader that loads mappings into ordered dictionaries.
"""
def __init__(self, *args, **kwargs):
yaml.Loader.__init__(self, *args, **kwargs)
self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)
def construct_yaml_map(self, node):
data = OrderedDict()
yield data
value = self.construct_mapping(node)
data.update(value)
def construct_mapping(self, node, deep=False):
if isinstance(node, yaml.MappingNode):
self.flatten_mapping(node)
else:
raise yaml.constructor.ConstructorError(None, None,
'expected a mapping node, but found %s' % node.id, node.start_mark)
mapping = OrderedDict()
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
try:
hash(key)
except TypeError, exc:
raise yaml.constructor.ConstructorError('while constructing a mapping',
node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
value = self.construct_object(value_node, deep=deep)
mapping[key] = value
return mapping
key_node.start_mark
attribute in your error message, I don't see any obvious way to simplify your central construction loop. If you try to make use of the fact that the OrderedDict
constructor will accept an iterable of key, value pairs, you lose access to that detail when generating the error message. –
Transverse add_constructor
in your __init__
method. –
Looper Update: the library was deprecated in favor of the yamlloader (which is based on the yamlordereddictloader)
I've just found a Python library (https://pypi.python.org/pypi/yamlordereddictloader/0.1.1) which was created based on answers to this question and is quite simple to use:
import yaml
import yamlordereddictloader
datas = yaml.load(open('myfile.yml'), Loader=yamlordereddictloader.Loader)
yodl
on github. –
Compaction On my For PyYaml installation for Python 2.7 I updated __init__.py, constructor.py, and loader.py. Now supports object_pairs_hook option for load commands. Diff of changes I made is below.
__init__.py
$ diff __init__.py Original
64c64
< def load(stream, Loader=Loader, **kwds):
---
> def load(stream, Loader=Loader):
69c69
< loader = Loader(stream, **kwds)
---
> loader = Loader(stream)
75c75
< def load_all(stream, Loader=Loader, **kwds):
---
> def load_all(stream, Loader=Loader):
80c80
< loader = Loader(stream, **kwds)
---
> loader = Loader(stream)
constructor.py
$ diff constructor.py Original
20,21c20
< def __init__(self, object_pairs_hook=dict):
< self.object_pairs_hook = object_pairs_hook
---
> def __init__(self):
27,29d25
< def create_object_hook(self):
< return self.object_pairs_hook()
<
54,55c50,51
< self.constructed_objects = self.create_object_hook()
< self.recursive_objects = self.create_object_hook()
---
> self.constructed_objects = {}
> self.recursive_objects = {}
129c125
< mapping = self.create_object_hook()
---
> mapping = {}
400c396
< data = self.create_object_hook()
---
> data = {}
595c591
< dictitems = self.create_object_hook()
---
> dictitems = {}
602c598
< dictitems = value.get('dictitems', self.create_object_hook())
---
> dictitems = value.get('dictitems', {})
loader.py
$ diff loader.py Original
13c13
< def __init__(self, stream, **constructKwds):
---
> def __init__(self, stream):
18c18
< BaseConstructor.__init__(self, **constructKwds)
---
> BaseConstructor.__init__(self)
23c23
< def __init__(self, stream, **constructKwds):
---
> def __init__(self, stream):
28c28
< SafeConstructor.__init__(self, **constructKwds)
---
> SafeConstructor.__init__(self)
33c33
< def __init__(self, stream, **constructKwds):
---
> def __init__(self, stream):
38c38
< Constructor.__init__(self, **constructKwds)
---
> Constructor.__init__(self)
here's a simple solution that also checks for duplicated top level keys in your map.
import yaml
import re
from collections import OrderedDict
def yaml_load_od(fname):
"load a yaml file as an OrderedDict"
# detects any duped keys (fail on this) and preserves order of top level keys
with open(fname, 'r') as f:
lines = open(fname, "r").read().splitlines()
top_keys = []
duped_keys = []
for line in lines:
m = re.search(r'^([A-Za-z0-9_]+) *:', line)
if m:
if m.group(1) in top_keys:
duped_keys.append(m.group(1))
else:
top_keys.append(m.group(1))
if duped_keys:
raise Exception('ERROR: duplicate keys: {}'.format(duped_keys))
# 2nd pass to set up the OrderedDict
with open(fname, 'r') as f:
d_tmp = yaml.load(f)
return OrderedDict([(key, d_tmp[key]) for key in top_keys])
© 2022 - 2024 — McMap. All rights reserved.
key_node.start_mark
attribute in your error message, I don't see any obvious way to simplify your central construction loop. If you try to make use of the fact that theOrderedDict
constructor will accept an iterable of key, value pairs, you lose access to that detail when generating the error message. – Transverse