PyYAML will just silently overwrite the first entry, ruamel.yaml¹ will give a DuplicateKeyFutureWarning
if used with the legacy API, and raise a DuplicateKeyError
with the new API.
If you don't want to create a full Constructor
for all types, overwriting the mapping constructor in SafeConstructor
should do the job:
import sys
from ruamel.yaml import YAML
from ruamel.yaml.constructor import SafeConstructor
yaml_str = """\
build:
step: 'step1'
build:
step: 'step2'
"""
def construct_yaml_map(self, node):
# test if there are duplicate node keys
data = []
yield data
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=True)
val = self.construct_object(value_node, deep=True)
data.append((key, val))
SafeConstructor.add_constructor(u'tag:yaml.org,2002:map', construct_yaml_map)
yaml = YAML(typ='safe')
data = yaml.load(yaml_str)
print(data)
which gives:
[('build', [('step', 'step1')]), ('build', [('step', 'step2')])]
However it doesn't seem necessary to make step: 'step1'
into a list. The following will only create the list if there are duplicate items (could be optimised if necessary, by caching the result of the self.construct_object(key_node, deep=True)
):
def construct_yaml_map(self, node):
# test if there are duplicate node keys
keys = set()
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=True)
if key in keys:
break
keys.add(key)
else:
data = {} # type: Dict[Any, Any]
yield data
value = self.construct_mapping(node)
data.update(value)
return
data = []
yield data
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=True)
val = self.construct_object(value_node, deep=True)
data.append((key, val))
which gives:
[('build', {'step': 'step1'}), ('build', {'step': 'step2'})]
Some points:
- Probably needless to say, this will not work with YAML merge keys (
<<: *xyz
)
- If you need ruamel.yaml's round-trip capabilities (
yaml = YAML()
) , that will require a more complex construct_yaml_map
.
If you want to dump the output, you should instantiate a new YAML()
instance for that, instead of re-using the "patched" one used for loading (it might work, this is just to be sure):
yaml_out = YAML(typ='safe')
yaml_out.dump(data, sys.stdout)
which gives (with the first construct_yaml_map
):
- - build
- - [step, step1]
- - build
- - [step, step2]
What doesn't work in PyYAML nor ruamel.yaml is yaml.load('file.yml')
. If you don't want to open()
the file yourself you can do:
from pathlib import Path # or: from ruamel.std.pathlib import Path
yaml = YAML(typ='safe')
yaml.load(Path('file.yml')
¹ Disclaimer: I am the author of that package.