Modifying YAML using ruamel.yaml adds extra new lines
Asked Answered
M

1

7

I need to add an extra value to an existing key in a YAML file. Following is the code I'm using.

with open(yaml_in_path, 'r') as f:
    doc, ind, bsi = load_yaml_guess_indent(f, preserve_quotes=True)
doc['phase1'] += ['c']
with open(yaml_out_path, 'w') as f:
    ruamel.yaml.round_trip_dump(doc, f,
                                indent=2, block_seq_indent=bsi)

This is the input and output.

Input

phase1:
  - a
  # a comment.
  - b

phase2:
  - d

Output

phase1:
  - a
  # a comment.
  - b

  - c
phase2:
  - d

How can I get rid of the new line between b and c? (This problem is not there when phase1 is the only key in the file or when there are no blank lines between phase1 and phase2.)

Mishandle answered 11/2, 2017 at 4:58 Comment(0)
A
2

The problem here is that the empty line is considered to be sort of a comment and that comments in ruamel.yaml are preserved by associating them with elements in a sequence or with keys in a mapping. That value is stored in a complex attribute named ca, on the list like object doc['phase1'], associated with the second element.

You can of course argue that it should be associated on the top level mapping/dict either associated with key phase1 (as some final empty-line-comment) or with phase2 as some introductory empty-line-comment. Either of the above three is valid and there is currently no control in the library over the strategy, where the empty line (or a comment goes).

If you put in a "real" comment (one starting with #) it will be associated with phase1 as an end comment, for those the strategy is different.

This obviously needs an overhaul, as the original goal of ruamel.yaml was: - load some configuration from YAML - change some value - save the configuration to YAML in which case these kind of append/insert problems don't appear.

So there is no real solution until the library is extended with some control over where to attach (trailing) comments and/or empty lines.

Until such control gets implemented, probably the best thing you can do is the following:

import sys
import ruamel.yaml

yaml_str = """\
phase1:
  - a
  # a comment.
  - b

phase2:
  - d
"""

def append_move_comment(l, e):
    i = len(l) - 1
    l.append(e)
    x = l.ca.items[i][0]  # the end comment
    if x is None:
        return
    l.ca.items[i][0] = None
    l.ca.items[i+1] = [x, None, None, None]

data = ruamel.yaml.round_trip_load(yaml_str)
append_move_comment(data['phase1'], 'c')
ruamel.yaml.round_trip_dump(data, sys.stdout, indent=4, block_seq_indent=2)

I changed the indent value to 4, which is what your input has (and get because you specify it as to small for the block_seq_indent).

Apodal answered 11/2, 2017 at 8:34 Comment(4)
Thanks for this. Can you explain me the reasoning behind assigning [x, None, None, None] instead of [x]? And where can I look in the source code of ruamel.yaml to understand this data structure?Mishandle
I am not sure if I properly check on its length of that list in all cases, you might be able to get away with less, but I would not try it. This is not documented on purpose, because it can change in future versions. The ca attribute is an instance of ruamel.yaml.comments.Comment() and x is ruamel.yaml.tokens.CommentToken(). Hint: use print(type(l.ca),'\n', type(x)) if you have nobody to ask ;-).Apodal
Hey @Anthon, I've just stumbled upon this very same issue. By any chances have there been any updates since you posted this answer? Is it now possible to avoid this in a more elegant way? Thank you!Tremml
@Tremml No there isn't (sorry, been busy moving and renovating and the end of that is approach but not near, so don't hold your breath).Apodal

© 2022 - 2024 — McMap. All rights reserved.