This is improvement of How to prevent yq removing comments and empty lines? comment.
In mine case was not enough diff -B
and diff -wB
as it still does not keep blank lines and keep generate an entire file difference as a single chunk instead of many small chunks.
Here is example of the input (test.yml
):
# This file is automatically generated
#
content-index:
timestamp: 1970-01-01T00:00:00Z
entries:
- dirs:
- dir: dir-1/dir-2
files:
- file: file-1.dat
md5-hash:
timestamp: 1970-01-01T00:00:00Z
- file: file-2.dat
md5-hash:
timestamp:
- file: file-3.dat
md5-hash:
timestamp:
- dir: dir-1/dir-2/dir-3
files:
- file: file-1.dat
md5-hash:
timestamp:
- file: file-2.dat
md5-hash:
timestamp:
If try to edit a field and generate the difference file:
diff -B test.yml <(yq -y ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" test.yml)
It does keep remove blank lines:
5,7c2
<
< timestamp: 1970-01-01T00:00:00Z
<
---
> timestamp: '2022-01-01T00:00:00Z'
Adds everywhere null
instead of an empty field and changes the rest of timestamp fields (which means you have to use '...' to retain these as is):
17,19c8,9
< md5-hash:
< timestamp: 1970-01-01T00:00:00Z
<
---
> md5-hash: null
> timestamp: '1970-01-01T00:00:00+00:00'
The -wB
flags changes the difference file from a single chunk into multiple chunks, but still does remove blank lines.
Here is a mention of that diff issue: https://unix.stackexchange.com/questions/423186/diff-how-to-ignore-empty-lines/423188#423188
To fix that you have to use it with grep:
diff -wB <(grep -vE '^\s*$' test.yml) <(yq -y ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" test.yml)
But nevertheless it still does remove comments:
1,2d0
< # This file is automatically generated
< #
Here is solution for that: https://unix.stackexchange.com/questions/17040/how-to-diff-files-ignoring-comments-lines-starting-with/17044#17044
So the complete oneliner is:
diff -wB <(grep -vE '^\s*(#|$)' test.yml) <(yq -y ".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" test.yml) | patch -o - test.yml 2>/dev/null
Where 2>/dev/null
stands to ignore patch warnings like:
Hunk #1 succeeded at 6 (offset 4 lines).
To avoid it in real code, you can use the -s
flag instead:
... | patch -s -o ...
Update:
CAUTION:
This is the previous implementation and has an issue with a line addition to the yaml file and left as an example of implementation. Search for more reliable implementation in the Update 2
section.
There is a better implementation as a shell script for GitHub Actions pipeline composite action.
GitHub Composite action: https://github.com/andry81-devops/gh-action--accum-content
Bash scripts (previous implementation):
Implementation: https://github.com/andry81-devops/gh-workflow/blob/2a60c95747ab741ca377f616c124545dd2a9331e/bash/github/init-yq-workflow.sh
Example of usage: https://github.com/andry81-devops/gh-workflow/blob/30a09eea05efbfb4567d9d56b482947d78fb40e5/bash/cache/accum-content.sh
The implementation can use 2 of yq
implementations:
Search for: yq_edit
, yq_diff
, yq_patch
functions
Update 2:
There is another discussion with some more reliable workarounds:
yq write strips completely blank lines from the output
: https://github.com/mikefarah/yq/issues/515
Bash scripts (new implementation):
Implementation: https://github.com/andry81-devops/gh-workflow/tree/HEAD/bash/github/init-yq-workflow.sh
Example of usage: https://github.com/andry81-devops/gh-workflow/tree/HEAD/bash/cache/accum-content.sh
# Usage example:
#
>yq_edit "<prefix-name>" "<suffix-name>" "<input-yaml>" "$TEMP_DIR/<output-yaml-edited>" \
<list-of-yq-eval-strings> && \
yq_diff "$TEMP_DIR/<output-yaml-edited>" "<input-yaml>" "$TEMP_DIR/<output-diff-edited>" && \
yq_restore_edited_uniform_diff "$TEMP_DIR/<output-diff-edited>" "$TEMP_DIR/<output-diff-edited-restored>" && \
yq_patch "$TEMP_DIR/<output-yaml-edited>" "$TEMP_DIR/<output-diff-edited-restored>" "$TEMP_DIR/<output-yaml-edited-restored>" "<output-yaml>"
#
# , where:
#
# <prefix-name> - prefix name part for files in the temporary directory
# <suffix-name> - suffix name part for files in the temporary directory
#
# <input-yaml> - input yaml file path
# <output-yaml> - output yaml file path
#
# <output-yaml-edited> - output file name of edited yaml
# <output-diff-edited> - output file name of difference file generated from edited yaml
# <output-diff-edited-restored> - output file name of restored difference file generated from original difference file
# <output-yaml-edited-restored> - output file name of restored yaml file stored as intermediate temporary file
Example with test.yml
from above:
export GH_WORKFLOW_ROOT='<path-to-gh-workflow-root>' # https://github.com/andry81-devops/gh-workflow
source "$GH_WORKFLOW_ROOT/bash/github/init-yq-workflow.sh"
[[ -d "./temp" ]] || mkdir "./temp"
export TEMP_DIR="./temp"
yq_edit 'content-index' 'edit' "test.yml" "$TEMP_DIR/test-edited.yml" \
".\"content-index\".timestamp=\"2022-01-01T00:00:00Z\"" && \
yq_diff "$TEMP_DIR/test-edited.yml" "test.yml" "$TEMP_DIR/test-edited.diff" && \
yq_restore_edited_uniform_diff "$TEMP_DIR/test-edited.diff" "$TEMP_DIR/test-edited-restored.diff" && \
yq_patch "$TEMP_DIR/test-edited.yml" "$TEMP_DIR/test-edited-restored.diff" "$TEMP_DIR/test.yml" "test-patched.yml" || exit $?
PROs:
- Can restore blank lines together with standalone comment lines:
# ...
- Can restore line end comments:
key: value # ...
- Can detect a line remove/change/add altogether.
CONs:
- Because of has having a guess logic, may leave artefacts or invalid corrections.
- Does not restore line end comments, where the yaml data is changed.
yq
have not added this feature yet - github.com/mikefarah/yq/issues/19 – Waiversed
is super universal tool for that. – Flieger