How to compare two files as part of unittest, while getting useful output in case of mismatch?

Asked 28/2, 2017 at 14:56 Answered 15/2, 2024 at 21:54

As part of some Python tests using the unittest framework, I need to compare two relatively short text files, where the one is a test output file and the other is a reference file.

The immediate approach is:

import filecmp
...
self.assertTrue(filecmp.cmp(tst_path, ref_path, shallow=False))

It works fine if the test passes, but in the even of failure, there is not much help in the output:

AssertionError: False is not true

Is there a better way of comparing two files as part of the unittest framework, so some useful output is generated in case of mismatch?

Subzero answered 28/2, 2017 at 14:56 Comment(2)

that will depend a LOT on what the files are expected to contain, I guess... – Archipenko 28/2, 2017 at 15:1

@Jblasco: Good point; the files are text files, so I will update the question with that info. – Subzero 28/2, 2017 at 15:39

To get a report of which line has a difference, and a printout of that line, use assertListEqual on the contents, e.g

self.assertListEqual(
    list(open(tst_path)),
    list(open(ref_path)))

Airglow answered 10/6, 2019 at 20:9 Comment(3)

Under my understanding, this will leave the files open until the garbage-collector notices, which leaves the files locked for too long under Windows. Consider using context managers to limit the time the files are open. – Mexico 4/9, 2021 at 6:13

@Mexico probably something like: with open(...) as tst, open(...) as ref: ... - open those with with statement, it does list on them as well no need for io.open and such. should close once it leaves 'with' scope – Handling 9/10, 2021 at 18:1

Yes, it's not a lot more complicated to include the auto-closing, e.g. with io.open(tst_path) as tst_f, io.open(ref_path) as ref_f: self.assertListEqual(list(tst_f), list(ref_f)) – Airglow 12/10, 2021 at 16:42

All you need to do is add your own message for the error condition. doc

self.assertTrue(filecmp(...), 'You error message')

Palladino answered 1/3, 2017 at 3:55 Comment(1)

A reminder for those who care: if the two files are different, it prints 'You error message' ONLY. – Cryotherapy 18/6, 2021 at 8:9

Comparing the files in the form of arrays bear meaningful assert errors:

assert [row for row in open(actual_path)] == [row for row in open(expected_path)]

You could use that each time you need to compare files, or put it in a function. You could also put the files in the forms of text string instead of arrays.

Paule answered 14/2, 2019 at 8:30 Comment(2)

in the event of multiple rows with mismatches, this will only report the first one. Not ideal. – Indiscreet 23/8, 2021 at 21:28

@ClintEastwood You can always join them I guess. Depending on your use case, it might be enough to fail with only one reported line. – Paule 25/8, 2021 at 7:52

You can use the built-in difflib module for this.

Use the unified_diff format, which is plain text and will be empty if the contents of the files match. The file contents need to be read into lists first, and the return of unified_diff is a generator, so we wrap it in a list so we can inspect it. Here's a template you can use:

from difflib import unified_diff
with open("my/expected/file.txt", "r") as f:
  expected_lines = f.readlines()
with open("my/actual/file.txt", "r") as f:
  actual_lines = f.readlines()

diff = list(unified_diff(expected_lines, actual_lines))
assert diff == [], "Unexpected file contents:\n" + "".join(diff)

My only complaint here is that I wish I had colors. If you wanted them really badly, you could implement your own diff formatting based on the output of get_grouped_opcodes from the same module.

Umbrage answered 5/8, 2023 at 17:18 Comment(0)

Isn't it better to compare the content of the two files. For example if they are text files compare the text of the two files, this will output some more meaningful error message.

Nebraska answered 28/2, 2017 at 15:4 Comment(1)

The intention is to compare the contents, so I added ', shallow=False' to 'filecmp.cmp' to make that clear. – Subzero 28/2, 2017 at 15:31

In case anyone else is still looking, our company developed an open source Python package to satisfy this need. Hope it helps!

Clone the project

git clone https://github.com/amentumspace/file_unittest

Installation

The package can be installed into any active environment by running:

cd file_unittest/   
pip install .

file_unittest.TestCase

This class extends the unittest.TestCase class to allow the user to run unit tests by writing data to a file and verifying the output has not changed since the last time the unit test was run.

Once the user is satisfied that the output data is correct, these unit test simply ensure that the data does not change with changes in the code base.

Usage

Rather than inherit from unittest.TestCase the user should derive a class from file_unittest.TestCase.

Like the normal unit test, individual test functions are appended with test_.

Any desired output data is written to file by calling self.output

Example unit test file:

import unittest
import file_unittest

class MyTest(file_unittest.TestCase):
    def test_case(self):
        self.output("hello")

if __name__ == '__main__':
    unittest.main()

Unlike the normal unittest.TestCase the user does not need to make any assertion calls eg. self.assertTrue, self.assertFalse etc.
This class will take care of warning about any differences in output.

Default output location

The default location to output the test results will be:

    {derived_class_filepath}/test_results/
      {derived_class_filename}.{derived_class_name}.{test_name}.txt

Missing or different results

If the expected output file is missing, or differences are detected, the output data will be written to the same file, but with .new postfix:

    {derived_class_filepath}/test_results/
      {derived_class_filename}.{derived_class_name}.{test_name}.new

On the first run, the user will need to inspect the .new file for expected results;
Otherwise, if differences are detected, the user can diff the .txt and .new files to investigate and resolve any differences;
In both cases once the user is satisfied with the results, the .new files can be renamed with .txt extension and the .txt files can be checked into the git repo

Committing test result files to source control

Once the output .txt files have been satisfactorily generated as in the previous step, they should be checked into source control so that they can be used as the benchmark for future runs.

Nonobedience answered 15/2, 2024 at 21:54 Comment(0)