Tool to automatically expand YAML merges?
Asked Answered
D

5

23

I'm looking for a tool or process which can easily take a YAML file which contains anchors, aliases and merge keys and expand the aliases and merges out into a flat YAML file. There are still many commonly used YAML parses which don't fully support merging.

I'd like to be able to take advantage of merging to keep things DRY, but there are instances where this needs to then be built into a more verbose "flat" YAML file so that it can be used by other tooling which relies on incomplete YAML parsers.

Example Source YAML:

default: &DEFAULT
  URL: website.com
  mode: production  
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600

development:
  <<: *DEFAULT
  URL: website.local
  mode: dev

test:
  <<: *DEFAULT
  URL: test.website.qa
  mode: test

Desired output YAML:

default:
  URL: website.com
  mode: production  
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600

development:
  URL: website.local
  mode: dev
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600

test:
  URL: test.website.qa
  mode: test
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600
Descendant answered 7/7, 2017 at 10:25 Comment(0)
O
16

If you have python installed on your system, you can do pip install ruamel.yaml.cmd¹ and then:

yaml merge-expand input.yaml output.yaml

(replace output.yaml with - to write to stdout). This implements the merge expanding with preservation of key order and comments.

The above is actually a few lines of code that utilizes ruamel.yaml¹ so if you have Python (2.7 or 3.4+) and install that using pip install ruamel.yaml and save the following as expand.py:

import sys
from ruamel.yaml import YAML

yaml = YAML(typ='safe')
yaml.default_flow_style=False
with open(sys.argv[1]) as fp:
    data = yaml.load(fp)
with open(sys.argv[2], 'w') as fp:
    yaml.dump(data, fp)

you can already do:

python expand.py input.yaml output.yaml

That will get you YAML that is semantically equivalent to what you requested (in output.yaml the keys of the mappings are sorted, in this programs output they are not).

The above assumes you don't have any tags in your YAML, nor care about preserving any comments. Most of those, and the key ordering, can be preserved by using a patched version of the standard YAML() instance. Patching is necessary because the standard YAML() instance preserves the merges on round-trip as well, which is exactly what you don't want:

import sys
from ruamel.yaml import YAML, SafeConstructor

yaml = YAML()

yaml.Constructor.flatten_mapping = SafeConstructor.flatten_mapping
yaml.default_flow_style=False
yaml.allow_duplicate_keys = True
# comment out next line if you want "normal" anchors/aliases in your output
yaml.representer.ignore_aliases = lambda x: True  

with open(sys.argv[1]) as fp:
    data = yaml.load(fp)
with open(sys.argv[2], 'w') as fp:
    yaml.dump(data, fp)

with this input:

default: &DEFAULT
  URL: website.com
  mode: production
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600  # an hour?

development:
  <<: *DEFAULT
  URL: website.local     # local web
  mode: dev

test:
  <<: *DEFAULT
  URL: test.website.qa
  mode: test

that will give this output (note that comments on the merged in keys get duplicated):

default:
  URL: website.com
  mode: production
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600  # an hour?

development:
  URL: website.local     # local web
  mode: dev

  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600  # an hour?

test:
  URL: test.website.qa
  mode: test
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600  # an hour?

The above is what the yaml merge-expand command, mentioned at the start of this answer, does.


¹ Disclaimer: I am the author of that package.

Ode answered 7/7, 2017 at 11:24 Comment(13)
Thank you. I will take a look into this. It's sounds perfect for my use case in that I can see it slotting nicely into a build process.Descendant
yaml merge-expand doesn't seem to work for me. All I get is the merge keys renamed to *id001, *id002, *id003, etc.Descendant
@Descendant Are you running the above programs as presented? I just (re-) tried this in a clean virtualenv and this works for me.Ode
Sorry, I should clarify. I got a little mixed up. So your examples definitely do work perfectly for the simple example YAML provided. When I try this with a production YAML file (~3000 lines) with many merges it simply renames all the anchors and aliases numerically as I've described. Is there a known limitation?Descendant
@Descendant Not that I know of, can you upload the YAML file somewhere so I can have a look at it, or email it to me?Ode
Yeah, no problem. I've been playing further with subsets of the YAML file I'm trying to work with and I think it has something to do with nesting/recursion. input.yaml (pastebin.com/eTWkznXr) and output.yaml (pastebin.com/hy6CrNXU)Descendant
Thanks, now that I can reproduce that, I'll try to look into what causes that soon (as in "the coming days", have to get some work done first)Ode
I had focussed to much on "expanding merges" and had left the (re-)creation of "normal" anchors/aliases unsolved. I've fixed the yaml command (pip install -U ruamel.yaml.cmd) and added the one-line fix to the above code as well.Ode
07/2022 this does not seem to work anymore using python 3.7 and the latest ruamel.yaml 0.17.21, ruamel.yaml.clib 0.2.6. The aliases expanded exactly without the overriding values.Garate
In fact, ruamel.yaml.cmd doesn't work with Python 3.7 currently (it uses new f-string functionality only available from newer Python versions). Would be an easy fix, or the project could declare that Python 3.7 isn't supported to avoid accidentally installing it into an unsupportable environment.Munich
@Munich f-strings were introduced in 3.6, or is there different level of f-string support in post 3.6 versions? I am using f-strings in the ruamel.yaml 0.18 pre-release work, and assumed that I would "only" have to drop support for 3.5-Ode
Sorry for not being clear. Yes, it is that the f-string usage is too new, not basic f-strings. f'{value=!r}', ~L703 in yaml_cmd. Normally I would be happy to contribute this kind of fix back, but the use of SourceForge makes that more effort than I would be willing to make (and admit).Munich
@Munich I pushed a new version to PyPI. I had completely forgotten that th` f'{somevar=}` syntax was added later on. Don't worry aboutOde
C
11

I did the expansion of anchors in yaml recently using

yq 'explode(.)' input.yaml > output.yaml

This is using the golang yq.

Connotative answered 13/12, 2022 at 5:25 Comment(0)
S
2

UPDATE: 2019-03-13 12:41:05

  • This answer was modified pursuant to a comment by Anthon which correctly identified limitations with PyYAML. (See Pitfalls infra).

Context

  • YAML file
  • Python for parsing the YAML

Problem

  • User jtYamlEnthusiast wishes to output a non-DRY version of a YAML file with aliases, anchors, and merge keys.

Solution(s)

  • Alternative 1: use the ruamel library promoted by Anthon infra.
  • Alternative 2: use Python pprint.pformat and simply do a load/dump round-trip transformation.

Rationale

  • the ruamel library is great if you have the discretion to install another python library besides pyyaml, and you want a high degree of control over "round-trip" YAML transformations (such as the preservation of YAML comments, for example).
  • if you do not need rigorous control over round-tripped YAML, or you are limited for some other reason to pyyaml, you can simply load and dump YAML directly, in order to obtain the "non-DRY" output.

Pitfalls

  • as of this writing PyYAML has limitations relative to the ruamel library, regarding the handling of YAML v1.1 and YAML v1.2

  • See also

Example

    ##
    import pprint
    import yaml
    ##
    myrawyaml = '''
    default: &DEFAULT
      URL: website.com
      mode: production
      site_name: Website
      some_setting: h2i8yiuhef
      some_other_setting: 3600

    development:
      <<: *DEFAULT
      URL: website.local
      mode: dev

    test:
      <<: *DEFAULT
      URL: test.website.qa
      mode: test
    '''
    ##
    pynative  =   yaml.safe_load(myrawyaml)
    vout      =   pprint.pformat(pynative)
    print(vout)                             ##=> this is non-DRY and just happens to be well-formed YAML syntax
    print(yaml.safe_load(vout))             ##=> this proves we have well-formed YAML if it loads without exception
Steadman answered 30/10, 2017 at 21:37 Comment(5)
This will not work, PyYAML fail to load YAML 1.2 files (either complaiins it can't on explicit documents, or fails silently)Ode
Hmm ... I was not able to reproduce the failure scenario on my machine. There may be a problem with using pprint.pformat because of unicode literals.Steadman
That is because your input has nothing that marks it as YAML 1.2. Try making it a complete directives document, or insert a an octal value like 0o52 (which should load as integer 42). It is your fist bullet item that claims that should work, which of course doesn't mean that every YAML document failsOde
Ahh ... I get your point. This is a long-standing issue with PyYAML. I will modify the answer to address this point you are making which is a legitimate point.Steadman
Longstanding it is indeed, the YAML 1.2 spec was issued in 2009 now almost 10 years ago.Ode
R
1

There's also a python package called yq which depends on jq. You have to install them both. After that, you can simply run

cat foo.yml | yq -y

for yaml output or

cat foo.yml | yq 

for JSON output. For example:

$ cat foo.yml | yq -y
default:
  URL: website.com
  mode: production
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600
development:
  URL: website.local
  mode: dev
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600
test:
  URL: test.website.qa
  mode: test
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600

from the example input file (foo.yaml)

default: &DEFAULT
  URL: website.com
  mode: production  
  site_name: Website
  some_setting: h2i8yiuhef
  some_other_setting: 3600

development:
  <<: *DEFAULT
  URL: website.local
  mode: dev

test:
  <<: *DEFAULT
  URL: test.website.qa
  mode: test

Installing

The yq can be installed with

pip install yq

and the jq install instructions can be found at https://jqlang.github.io/jq/download/

Rhody answered 15/1 at 16:38 Comment(0)
M
0

If you for some reason have a use case where you need to write the expanded YAML back to a file as YAML, you can:

  • Use @Anthon's answer. As noted above, though, this approach might not be feasible if you can't install packages.

  • Use @dreftymac's answer. It appears that this answer has worked for some people, but it didn't work for me; by my understanding, pprint.pformat returns the argument as a string of its Python representation, and yaml.safe_load expects the Python representation itself. Of course, you could eval the string returned by pprint.pformat, but using eval on even trusted input feels icky. (Again, the answer has a couple of upvotes so maybe I'm missing something here.)

Alternatively, you can do what I did:

import json
import yaml

def expand_yml(yml):
    return yaml.dump(json.loads(json.dumps(yml)))

expand_yml(my_yml_with_aliases)

Since JSON can (with some exceptions, such as aliases) be regarded as a strict subset of YAML, this approach should generally work. However, if performance is a concern, or if you're dealing with hairier YAML, this approach might not work for you.

Messenia answered 24/11, 2020 at 19:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.