pretty output with pyyaml
Asked Answered
D

2

21

I have a python project where I'd like to use YAML (pyYaml 3.11), particularly because it is "pretty" and easy for users to edit in a text editor if and when necessary. My problem, though, is if I bring the YAML into a python application (as I will need to) and edit the contents (as I will need to) then writing the new document is typically not quite as pretty as what I started with.

The pyyaml documentation is pretty poor - does not even document the parameters to the dump function. I found http://dpinte.wordpress.com/2008/10/31/pyaml-dump-option/. However, I'm still missing the information I need. (I started to look at the source, but it doesn't seem the most inviting. If I don't get the solution here, then that's my only recourse.)

I start with a document that looks like this:

- color green :
     inputs :
        - port thing :
            widget-hint : filename
            widget-help : Select a filename
        - port target_path : 
            widget-hint : path
            value : 'thing' 
     outputs:
        - port value:
             widget-hint : string
     text : |
            I'm lost and I'm found
            and I'm hungry like the wolf.

After loading into python (yaml.safe_load( s )), I try a couple ways of dumping it out:

>>> print yaml.dump( d3, default_flow_style=False, default_style='' )
- color green:
    inputs:
    - port thing:
        widget-help: Select a filename
        widget-hint: filename
    - port target_path:
        value: thing
        widget-hint: path
    outputs:
    - port value:
        widget-hint: string
    text: 'I''m lost and I''m found

      and I''m hungry like the wolf.

      '
>>> print yaml.dump( d3, default_flow_style=False, default_style='|' )
- "color green":
    "inputs":
    - "port thing":
        "widget-help": |-
          Select a filename
        "widget-hint": |-
          filename
    - "port target_path":
        "value": |-
          thing
        "widget-hint": |-
          path
    "outputs":
    - "port value":
        "widget-hint": |-
          string
    "text": |
      I'm lost and I'm found
      and I'm hungry like the wolf.

Ideally, I would like "short strings" to not use quotes, as in the first result. But I would like multi-line strings to be written as blocks, as with the second result. I guess fundamentally, I'm trying to minimize an explosion of unnecessary quotes in the file which I perceive would make it much more annoying to edit in a text editor.

Does anyone have any experience with this?

Dahlgren answered 25/6, 2014 at 20:59 Comment(0)
O
14

If you can use ruamel.yaml (disclaimer: I am the author of this enhanced version of PyYAML) you can round-trip the original format (YAML document stored in a file org.yaml):

import sys
import ruamel.yaml
from pathlib import Path

file_org = Path('org.yaml')
    
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
data = yaml.load(file_org)
yaml.dump(data, sys.stdout)

which gives:

- color green:
    inputs:
    - port thing:
        widget-hint: filename
        widget-help: Select a filename
    - port target_path:
        widget-hint: path
        value: 'thing'
    outputs:
    - port value:
        widget-hint: string
    text: |
      I'm lost and I'm found
      and I'm hungry like the wolf.

Your input is inconsistently indented/formatted, and although there is for more control in ruamel.yaml over the output than in PyYAML, you cannot get your exact original back:

  • you sometimes (color green :) have a space before the value indicator (:) and sometimes you don't (outputs:). Apart from special control over root level keys, ruamel.yaml always puts the value indicator directly adjoint to the key.
  • your root level sequence is indented two columns with offset for the block sequence indicator (-) of zero (this is the default ruamel.yaml uses). Others are indented five with three offset. ruamel.yaml cannot format sequences individually/inconstently, I recommend using the default since your root collection is a sequence.
  • your mappings are sometimes indented three columns (value for key color green) sometimes two (e.g. value for key port target_path). Again ruamel.yaml cannot format these individually/inconstently
  • Your block style literal scalar is indented more than the standard two spaces if you don't append a block indentation indicator to the | indicator (e.g. using |4). So this extra indention will be lost

As you see setting yaml.preserv_quotes keeps the superfluous quotes around 'thing' as that is not what you want, it is not set in the rest of this examples.

The following "normalises" all three examples:

import sys
import ruamel.yaml
from pathlib import Path
LT = ruamel.yaml.scalarstring.LiteralScalarString

file_org = Path('org.yaml')
file_plain = Path('plain.yaml')
file_block = Path('block.yaml')

def normalise(d):
    if isinstance(d, dict):
        for k, v in d.items():
             d[k] = normalise(v)
        return d
    if isinstance(d, list):
        for idx, elem in enumerate(d):
            d[idx] = normalise(elem)
        return d
    if not isinstance(d, str):
        return d
    if '\n' in d:
        if isinstance(d, LT):
            return d     # already a block style literal scalar
        return LT(d)
    return str(d)

yaml = ruamel.yaml.YAML()
for fn in [file_org, file_plain, file_block]:
    data = normalise(yaml.load(file_org))
    yaml.dump(data, fn)

assert file_org.read_bytes() == file_plain.read_bytes()
assert file_org.read_bytes() == file_block.read_bytes()
print(file_block.read_text())

which gives:

- color green:
    inputs:
    - port thing:
        widget-hint: filename
        widget-help: Select a filename
    - port target_path:
        widget-hint: path
        value: thing
    outputs:
    - port value:
        widget-hint: string
    text: |
      I'm lost and I'm found
      and I'm hungry like the wolf.

So, as you indicated, you get block style literal scalars if a scalar has newlines, and no block style and no quotes if a scalar it doesn't have a newline.

Orts answered 23/6, 2015 at 14:27 Comment(4)
Is it easy to specify in ruamel.yaml that multi-line strings should be written as blocks (with |) and short string should not receive quotes, without already having a yaml file to invoke a round-trip on?Inflated
@Inflated That depends on your defiition of easy, you of course have to either have some rules (e.g. any string containing a newline should be a literal block scalar; any without spaces should be unquoted) if you don't want to use those by hand. Why don't you ask a question here on SO about that, if you are interested in getting that done?Orts
I'm now using stackoverflow.com/a/33300001 . I had hoped you might have included an option for this since it seems to be what many people want. I'm insufficiently familiar with the various edge cases to appreciate that there is no clear-cut implementation of this.Inflated
@Inflated My answer there could use some updating.Orts
P
11

Try the pyaml pretty printer. It gets closer, though it does put quotes around short strings with spaces in them:

>>> print pyaml.dump(d3)
- 'color green':
    inputs:
      - 'port thing':
          widget-help: 'Select a filename'
          widget-hint: filename
      - 'port target_path':
          value: thing
          widget-hint: path
    outputs:
      - 'port value':
          widget-hint: string
    text: |
      I'm lost and I'm found
      and I'm hungry like the wolf.
Pikestaff answered 20/1, 2015 at 10:6 Comment(1)
In case anyone else reads this comment and futilely tries to find the pretty_print option for yaml.dump()... this comment refers to the less-standard pyaml package (imported with import pyaml), as opposed to the more standard PyYAML (imported with import yaml).Selfpossession

© 2022 - 2024 — McMap. All rights reserved.