How to prevent yq removing comments and empty lines?
Asked Answered
F

3

13

Here Edit yaml objects in array with yq. Speed up Terminalizer's terminal cast (record) I asked about how to edit yaml with yq. I received the best answer. But by default yq removes comments and empty lines. How to prevent this behavior?

input.yml

# Specify a command to be executed
# like `/bin/bash -l`, `ls`, or any other commands
# the default is bash for Linux
# or powershell.exe for Windows
command: fish -l

# Specify the current working directory path
# the default is the current working directory path
cwd: null

# Export additional ENV variables
env:
  recording: true

# Explicitly set the number of columns
# or use `auto` to take the current
# number of columns of your shell
cols: 110

execute

yq -y . input.yml

result

command: fish -l
cwd: null
env:
  recording: true
cols: 110
Flieger answered 23/8, 2019 at 13:20 Comment(5)
It seems the developers of yq have not added this feature yet - github.com/mikefarah/yq/issues/19Waiver
Kyb - Having looked at several yaml2json tools, I've come to the tentative conclusion that not only has the problem not been solved by any such tool, but also that the difficulty of a generic and invertible solution is probably beyond the realm of FOSS. In other words, you'll probably have more luck modifying the YAML directly.Equivalent
@peak, I found the same. sed is super universal tool for that.Flieger
The problem is that most such tools aren't working directly with the text; they work with an abstract syntax tree produced by parsing the text, and comments typically are removed during parsing. In general, where would you "reinsert" any saved comments, if the output doesn't resemble in the input?Dex
Why can't the parser and AST also model the comments? Other languages/parser are able to do so (and use comments for e.g. run-time annotations).Toggery
S
11

In some limited cases you could use diff/patch along with yq.
For example if input.yml contains your input text, the commands

$ yq -y . input.yml > input.yml.1
$ yq -y .env.recording=false input.yml > input.yml.2
$ diff input.yml.1 input.yml.2 > input.yml.diff
$ patch -o input.yml.new input.yml < input.yml.diff

creates a file input.yml.new with comments preserved but recording changed to false:

# Specify a command to be executed
# like `/bin/bash -l`, `ls`, or any other commands
# the default is bash for Linux
# or powershell.exe for Windows
command: fish -l

# Specify the current working directory path
# the default is the current working directory path
cwd: null

# Export additional ENV variables
env:
  recording: false

# Explicitly set the number of columns
# or use `auto` to take the current
# number of columns of your shell
cols: 110
Sheff answered 14/9, 2019 at 18:27 Comment(4)
Thanks for this idea. I used the recent yq version 4.2.0, which is a big step forward (also for those that use jq and that version has no problems preserving comments. However, I had problems with preserving blank lines, which I was able to circumvent with diff's -B option. Using some bashisms, I was able to put that as a oneliner: diff -B "$src_file" <(yq eval "$eval_line" "$src_file") | patch -o - "$src_file"Dentistry
This is a fantastic hint. Only improvement I would suggest is to use diff -au for more resilience (without these options in my yamls patch got inserted into a wrong place).Lingwood
Still a problem in yq 4.35.2, I expanded @tlwhitec's oneliner to actually do the same as the answer, which avoids more whitespace changes (and updates the existing file): diff -B <(yq "$src_file") <(yq "$eval_line" "$src_file") | patch "$src_file"Cartomancy
Brilliant idea! I changed it a bit though as it seemed more logical: Store the "bugs" introduced by yq into a bugs.diff, do the actual changes, and then undo the bugs by patch -R < bugs.diff. I did not use diff -B since yq also tampers the comments.Ancheta
C
1

This is improvement of How to prevent yq removing comments and empty lines? comment.

In mine case was not enough diff -B and diff -wB as it still does not keep blank lines and keep generate an entire file difference as a single chunk instead of many small chunks.

Here is example of the input (test.yml):

# This file is automatically generated
#

content-index:

  timestamp: 1970-01-01T00:00:00Z

  entries:

    - dirs:

        - dir: dir-1/dir-2

          files:

            - file: file-1.dat
              md5-hash:
              timestamp: 1970-01-01T00:00:00Z

            - file: file-2.dat
              md5-hash:
              timestamp:

            - file: file-3.dat
              md5-hash:
              timestamp:

        - dir: dir-1/dir-2/dir-3

          files:

            - file: file-1.dat
              md5-hash:
              timestamp:

            - file: file-2.dat
              md5-hash:
              timestamp:

If try to edit a field and generate the difference file:

diff -B test.yml <(yq -y ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" test.yml)

It does keep remove blank lines:

5,7c2
<
<   timestamp: 1970-01-01T00:00:00Z
<
---
>   timestamp: '2022-01-01T00:00:00Z'

Adds everywhere null instead of an empty field and changes the rest of timestamp fields (which means you have to use '...' to retain these as is):

17,19c8,9
<               md5-hash:
<               timestamp: 1970-01-01T00:00:00Z
<
---
>               md5-hash: null
>               timestamp: '1970-01-01T00:00:00+00:00'

The -wB flags changes the difference file from a single chunk into multiple chunks, but still does remove blank lines.

Here is a mention of that diff issue: https://unix.stackexchange.com/questions/423186/diff-how-to-ignore-empty-lines/423188#423188

To fix that you have to use it with grep:

diff -wB <(grep -vE '^\s*$' test.yml) <(yq -y ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" test.yml)

But nevertheless it still does remove comments:

1,2d0
< # This file is automatically generated
< #

Here is solution for that: https://unix.stackexchange.com/questions/17040/how-to-diff-files-ignoring-comments-lines-starting-with/17044#17044

So the complete oneliner is:

diff -wB <(grep -vE '^\s*(#|$)' test.yml) <(yq -y ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" test.yml) | patch -o - test.yml 2>/dev/null

Where 2>/dev/null stands to ignore patch warnings like:

Hunk #1 succeeded at 6 (offset 4 lines).

To avoid it in real code, you can use the -s flag instead:

... | patch -s -o ...

Update:

CAUTION:

This is the previous implementation and has an issue with a line addition to the yaml file and left as an example of implementation. Search for more reliable implementation in the Update 2 section.

There is a better implementation as a shell script for GitHub Actions pipeline composite action.

GitHub Composite action: https://github.com/andry81-devops/gh-action--accum-content

Bash scripts (previous implementation):

Implementation: https://github.com/andry81-devops/gh-workflow/blob/2a60c95747ab741ca377f616c124545dd2a9331e/bash/github/init-yq-workflow.sh
Example of usage: https://github.com/andry81-devops/gh-workflow/blob/30a09eea05efbfb4567d9d56b482947d78fb40e5/bash/cache/accum-content.sh

The implementation can use 2 of yq implementations:

Search for: yq_edit, yq_diff, yq_patch functions

Update 2:

There is another discussion with some more reliable workarounds:
yq write strips completely blank lines from the output : https://github.com/mikefarah/yq/issues/515

Bash scripts (new implementation):

Implementation: https://github.com/andry81-devops/gh-workflow/tree/HEAD/bash/github/init-yq-workflow.sh
Example of usage: https://github.com/andry81-devops/gh-workflow/tree/HEAD/bash/cache/accum-content.sh

# Usage example:
#
>yq_edit "<prefix-name>" "<suffix-name>" "<input-yaml>" "$TEMP_DIR/<output-yaml-edited>" \
  <list-of-yq-eval-strings> && \
  yq_diff "$TEMP_DIR/<output-yaml-edited>" "<input-yaml>" "$TEMP_DIR/<output-diff-edited>" && \
  yq_restore_edited_uniform_diff "$TEMP_DIR/<output-diff-edited>" "$TEMP_DIR/<output-diff-edited-restored>" && \
  yq_patch "$TEMP_DIR/<output-yaml-edited>" "$TEMP_DIR/<output-diff-edited-restored>" "$TEMP_DIR/<output-yaml-edited-restored>" "<output-yaml>"
#
# , where:
#
#   <prefix-name> - prefix name part for files in the temporary directory
#   <suffix-name> - suffix name part for files in the temporary directory
#
#   <input-yaml>  - input yaml file path
#   <output-yaml> - output yaml file path
#
#   <output-yaml-edited>          - output file name of edited yaml
#   <output-diff-edited>          - output file name of difference file generated from edited yaml
#   <output-diff-edited-restored> - output file name of restored difference file generated from original difference file
#   <output-yaml-edited-restored> - output file name of restored yaml file stored as intermediate temporary file

Example with test.yml from above:

export GH_WORKFLOW_ROOT='<path-to-gh-workflow-root>' # https://github.com/andry81-devops/gh-workflow

source "$GH_WORKFLOW_ROOT/bash/github/init-yq-workflow.sh"

[[ -d "./temp" ]] || mkdir "./temp"

export TEMP_DIR="./temp"

yq_edit 'content-index' 'edit' "test.yml" "$TEMP_DIR/test-edited.yml" \
  ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" && \
  yq_diff "$TEMP_DIR/test-edited.yml" "test.yml" "$TEMP_DIR/test-edited.diff" && \
  yq_restore_edited_uniform_diff "$TEMP_DIR/test-edited.diff" "$TEMP_DIR/test-edited-restored.diff" && \
  yq_patch "$TEMP_DIR/test-edited.yml" "$TEMP_DIR/test-edited-restored.diff" "$TEMP_DIR/test.yml" "test-patched.yml" || exit $?

PROs:

  • Can restore blank lines together with standalone comment lines: # ...
  • Can restore line end comments: key: value # ...
  • Can detect a line remove/change/add altogether.

CONs:

  • Because of has having a guess logic, may leave artefacts or invalid corrections.
  • Does not restore line end comments, where the yaml data is changed.
Cortez answered 28/4, 2022 at 9:56 Comment(0)
V
1

My use case i was running an yq image inside docker container without any diff commands being installed on the base image and also just want to have an inplace update. So instead i replaced the empty lines first with a placeholder tag

sed -i '/^$/s// #BLANK_LINE/' ./$filename executed the operation and then replaced the placeholder back with empty lines sed -i "s/ *#BLANK_LINE//g" ./$filename . and worked like a charm

Overall combined command:

sed -i '/^$/s// #BLANK_LINE/' ./$filename;yq {{operation}}; sed -i  "s/ *#BLANK_LINE//g" ./$filename 
Visionary answered 19/10, 2023 at 17:54 Comment(2)
i think this would not keep comments. but sure comments also can be put under some placeholder. @Visionary please extend you answer for commented lines, then i'd upvote on it.Flieger
I think new version starting from 4.2X already take care of comments ,yq version 4.34 that i used does not removes comments its just empty lines that it removes that can be handled with sed and inplace flag -iVisionary

© 2022 - 2024 — McMap. All rights reserved.