How to dump YAML with explicit references?
Asked Answered
K

1

6

Recursive references work great in ruamel.yaml or pyyaml:

$ ruamel.yaml.dump(ruamel.yaml.load('&A [ *A ]'))
'&id001
- *id001'

However it (obviously) does not work on normal references:

$ ruamel.yaml.dump(ruamel.yaml.load("foo: &foo { a: 42 }\nbar: { <<: *foo }"))
bar: {a: 42}
foo: {a: 42}

I would like is to explicitly create a reference:

data = {}
data['foo'] = {'foo': {'a': 42}}
data['bar'] = { '<<': data['foo'], 'b': 43 }

$ ruamel.yaml.dump(data, magic=True)
foo: &foo
    a: 42
bar: 
    <<: *foo
    b: 43

This will be very useful to generate YAML output of large data structures that have lots of common keys

How is it possible without disputable re.replace on the output?

Actually the result of ruamel.yaml.dump(data) is

bar:
  '<<': &id001
    foo:
      a: 42
  b: 43
foo: *id001

So I need to replace '<<' with << and maybe replace id001 with foo.

Kiwi answered 16/9, 2016 at 14:34 Comment(0)
K
4

If you want to create something like that, at least in ruamel.yaml ¹, you should use round-trip mode, which also preserves the merges. The following doesn't throw an assertion error:

import ruamel.yaml

yaml_str = """\
foo: &xyz
  a: 42
bar:
  <<: *xyz
"""

data = ruamel.yaml.round_trip_load(yaml_str)
assert ruamel.yaml.round_trip_dump(data) == yaml_str

What this means is that data has enough information to recreate the merge as it was in the output. In practise however, in round-trip mode, the merge never takes place. Instead retrieving a value data['foo']['bar']['a'] means that there is no real key 'bar' in data['foo'], but that that key is subsequently looked up in the attached "merge mappings".

There is no public interface for this (so things might change), but by analyzing data and looking at ruamel.yaml.comments.CommentedMap() you can find that there is a merge_attrib (currently being the string _yaml_merge) and more useful that there is a method add_yaml_merge(). The latter takes a list of (int, CommentedMap()) tuples.

baz = ruamel.yaml.comments.CommentedMap()
baz['b'] = 196
baz.yaml_set_anchor('klm')
data.insert(1, 'baz', baz)

you need to insert the 'baz' key before the 'bar' key of data, otherwise the mapping will reverse. After insert the new structure in the merge for data['bar']:

data['bar'].add_yaml_merge([(0, baz)])
ruamel.yaml.round_trip_dump(data, sys.stdout)

which gives:

foo: &xyz
  a: 42
baz: &klm
  b: 196
bar:
  <<: [*xyz, *klm]

( if you like to see what add_yaml_merge does insert

print(getattr(data['bar'], ruamel.yaml.comments.merge_attrib))

before and after the call)

If you want to start from scratch completely you can do:

data = ruamel.yaml.comments.CommentedMap([
    ('foo', ruamel.yaml.comments.CommentedMap([('a', 42)])),
    ])
data['foo'].yaml_set_anchor('xyz')
data['bar'] = bar = ruamel.yaml.comments.CommentedMap()
bar.add_yaml_merge([(0, data['foo'])])

instead of the data = ruamel.yaml.round_trip_load(yaml_str).


¹ Disclaimer: I am the author of that package.

Kuth answered 16/9, 2016 at 16:6 Comment(4)
Is there another way than d = ruamel.yaml.round_trip_load(ruamel.yaml.dump(data)) to import a dict into a round_trip object?Kiwi
@Kiwi To start "from scratch" you'll have to start with data = ruamel.yaml.comments.CommentedMap() instead of data = dict() (and the same for any other dict you want to create underneath that has to keep keys ordered and allow for the extra data (reference names, comments, etc) to be attached).Kuth
@Kiwi I updated the answer with the complete code to get rid of the round_trip_load()Kuth
Thanks but this is not very convenient. My goal is to add a merge on an existing dict which is huge. Doing data = ruamel.yaml.round_trip_load(yaml_str) is slower but much easier to do. With your example I need to recursively parse my data structure to create ruamel.yaml.comments.CommentedMap objects.Kiwi

© 2022 - 2024 — McMap. All rights reserved.