How to compare yaml files regardless of ordering differences?
Asked Answered
E

2

6

I need to compare yaml files that are generated from two different processes and are ordered differently and detect if they are logically the same ideally in python.

yaml file 1:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

yaml file 2:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 3
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        ports:
        - containerPort: 80
        image: nginx:1.14.2

What's the best way to generate useful diffs of the yaml at the logical level vs the literal text level? In the toy example above these yaml files should be considered equivalent.

Edrei answered 22/7, 2021 at 16:54 Comment(0)
E
5

The best solution I was able to arrive at is below script.

"""
python path_to_dir/compare_yaml.py path_to_dir/file1.yaml path_to_dir/file2.yaml
"""
import argparse
import yaml
import dictdiffer

parser = argparse.ArgumentParser(description='Convert two yaml files to dict and compare equality. Allows comparison of differently ordered keys.')
parser.add_argument('file_paths', type=str, nargs=2,
                    help='Full paths to yaml documents')
args = parser.parse_args()

print(f"File Path 1: {args.file_paths[0]}")
print(f"File Path 2: {args.file_paths[1]}")

with open(args.file_paths[0],'r') as rdr:
    data1=rdr.read()

with open(args.file_paths[1],'r') as rdr:
    data2=rdr.read()

data1_dict = yaml.load(data1,Loader=yaml.FullLoader)
data2_dict = yaml.load(data2,Loader=yaml.FullLoader)

if data1_dict == data2_dict:
    print("No difference detected")
else:
    print("Differences detected:")
    for diff in list(dictdiffer.diff(data1_dict, data2_dict)):
        print(diff)

If run against example in question as is:

python .../compare_yaml.py .../yaml1.yaml .../yaml2.yaml
File Path 1: .../yaml1.yaml
File Path 2: .../yaml2.yaml
No difference detected

If change a key, then get output like this:

python .../compare_yaml.py .../yaml1.yaml .../yaml2.yaml
File Path 1: .../yaml1.yaml
File Path 2: .../yaml2.yaml
Differences detected:
('change', ['spec', 'template', 'spec', 'containers', 0, 'name'], ('nginx', 'ngin'))
Edrei answered 22/7, 2021 at 16:54 Comment(0)
P
0

Go utility dyff

homeport/dyff, installable from the Go repository (but also via Homebrew or Snap Store):

/ˈdʏf/ - diff tool for YAML files, and sometimes JSON

Actively maintained, last release v1.8.0 on Jun 5, 2024.

dyff between file1.yaml file2.yaml                                                                                   

Prints no difference as output:

     _        __  __
   _| |_   _ / _|/ _|  between file1.yaml
 / _' | | | | |_| |_       and file2.yaml
| (_| | |_| |  _|  _|
 \__,_|\__, |_| |_|   returned no differences
        |___/

Python utility yamldiff

The CLI utility yamldiff, installable from PyPI:

A package to semantically diff yaml files on the console.

Used with your example files:

yamldiff file1.yaml file2.yaml                                                                                       

Prints a diff due to order of elements (in ANSI-color highlights deletions in red and additions in green):

# Modified keys:
metadata:
spec:
    # Modified keys:
    template:
        # Modified keys:
        spec:
            # Modified keys:
            containers:
                - [{name: nginx, image: nginx:1.14.2, ports: ['{containerPort: 80}']}]
                + [{name: nginx, ports: ['{containerPort: 80}'], image: nginx:1.14.2}]

But it can be used with option --set-keys containers:name to ignore order when comparing the list containers and use name as set-key.

Prints rather expected output:

# Modified keys:
metadata:
spec:
    # Modified keys:
    template:
        # Modified keys:
        spec:
            # Modified keys:
            containers:
                # Modified keys:
                # Matching:
                {name: nginx}

Note: Since version 0.3 on Jul 11, 2022 it seems unmaintained.

Piccalilli answered 15/7 at 20:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.