Comparing two .txt files using difflib in Python
Asked Answered
K

6

24

I am trying to compare two text files and output the first string in the comparison file that does not match but am having difficulty since I am very new to python. Can anybody please give me a sample way to use this module.

When I try something like:

result = difflib.SequenceMatcher(None, testFile, comparisonFile)

I get an error saying object of type 'file' has no len.

Kendra answered 10/6, 2009 at 18:51 Comment(0)
N
35

For starters, you need to pass strings to difflib.SequenceMatcher, not files:

# Like so
difflib.SequenceMatcher(None, str1, str2)

# Or just read the files in
difflib.SequenceMatcher(None, file1.read(), file2.read())

That'll fix your error.

To get the first non-matching string, see the difflib documentation.

Neve answered 10/6, 2009 at 19:6 Comment(2)
@OP: In addition to the docs, have a look at Doug Hellmann's excellent Python module-of-the-week difflib entry: blog.doughellmann.com/2007/10/pymotw-difflib.htmlBuchheim
@BlackVegetable link to the web archive project and Python Module of the week linkTribunate
U
10

Here is a quick example of comparing the contents of two files using Python difflib...

import difflib

file1 = "myFile1.txt"
file2 = "myFile2.txt"

diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines())
print ''.join(diff),
Unquiet answered 14/2, 2014 at 22:18 Comment(2)
How could we avoid to display lines that are the same ? I just want lines that differ to be printed.Symmetrize
@OlivierCervello import difflib, sys with open("a") as a: a_content = a.readlines() with open("b") as b: b_content = b.readlines() diff = difflib.unified_diff(a_content,b_content) print("***** Unified diff ************") print("Line no"+'\t'+'file1'+'\t'+'file2') for i,line in enumerate(diff): if line.startswith("-"): print(i,'\t\t'+line) elif line.startswith("+"): print(i,'\t\t\t\t\t\t'+line) 'Kooima
J
5

Are you sure both files exist ?

Just tested it and i get a perfect result.

To get the results i use something like:

import difflib

diff=difflib.ndiff(open(testFile).readlines(), open(comparisonFile).readlines())

try:
    while 1:
        print diff.next(),
except:
    pass

the first character of each line indicates if they are different: eg.: '+' means the following line has been added, etc.

Judithjuditha answered 10/6, 2009 at 19:3 Comment(1)
oops, you're right silly mistake. But I'm still not sure how to get the data I need out of result. How do I even know if they differ or not? How can I get the first string that differs? Sorry lots of questions :(Kendra
J
3

It sounds like you may not need difflib at all. If you're comparing line by line, try something like this:

test_lines = open("test.txt").readlines()
correct_lines = open("correct.txt").readlines()

for test, correct in zip(test_lines, correct_lines):
    if test != correct:
        print "Oh no! Expected %r; got %r." % (correct, test)
        break
else:
    len_diff = len(test_lines) - len(correct_lines)
    if len_diff > 0:
        print "Test file had too much data."
    elif len_diff < 0:
        print "Test file had too little data."
    else:
        print "Everything was correct!"
Jemmy answered 10/6, 2009 at 19:39 Comment(2)
you don't need readlines there, zip can do with file handlers tooChappy
won't this break if the files have the same amount of lines but different content?Kendra
A
0

Another easier method to check whether two text files are same line by line. Try it out.

fname1 = 'text1.txt'
fname2 = 'text2.txt'

f1 = open(fname1)
f2 = open(fname2)

lines1 = f1.readlines()
lines2 = f2.readlines()
i = 0
f1.seek(0)
f2.seek(0)
for line1 in f1:
    if lines1[i] != lines2[i]:
        print(lines1[i])
        exit(0)
    i = i+1

print("both are equal")

f1.close()
f2.close()

otherwise, there is a predefined file in python in filecmp which you can use.

import filecmp

fname1 = 'text1.txt'
fname2 = 'text2.txt'

print(filecmp.cmp(fname1, fname2))

:)

Aarau answered 21/5, 2019 at 10:35 Comment(0)
C
-1
# -*- coding: utf-8 -*-
"""
   

"""

def compare_lines_in_files(file1_path, file2_path):
    try:
        with open(file1_path, 'r', encoding='utf-8') as file1, open(file2_path, 'r', encoding='utf-8') as file2:
            lines_file1 = file1.readlines()
            lines_file2 = file2.readlines()

            mismatched_lines = []

            # Compare each line in file1 to all lines in file2
            for line_num, line1 in enumerate(lines_file1, start=1):
                line1 = line1.strip()  # Remove leading/trailing whitespace
                found_match = False

                for line_num2, line2 in enumerate(lines_file2, start=1):
                    line2 = line2.strip()  # Remove leading/trailing whitespace

                    # Perform a case-insensitive comparison
                    if line1.lower() == line2.lower():
                        found_match = True
                        break

                if not found_match:
                    mismatched_lines.append(f"Line {line_num} in File 1: '{line1}' has no match in File 2")

            # Compare each line in file2 to all lines in file1 (vice versa)
            for line_num2, line2 in enumerate(lines_file2, start=1):
                line2 = line2.strip()  # Remove leading/trailing whitespace
                found_match = False

                for line_num, line1 in enumerate(lines_file1, start=1):
                    line1 = line1.strip()  # Remove leading/trailing whitespace

                    # Perform a case-insensitive comparison
                    if line2.lower() == line1.lower():
                        found_match = True
                        break

                if not found_match:
                    mismatched_lines.append(f"Line {line_num2} in File 2: '{line2}' has no match in File 1")

            return mismatched_lines

    except FileNotFoundError:
        print("One or both files not found.")
        return []

# Paths to the two text files you want to compare
file1_path = r'C:\Python Space\T1.txt'
file2_path = r'C:\Python Space\T2.txt'

mismatched_lines = compare_lines_in_files(file1_path, file2_path)

if mismatched_lines:
    print("Differences between the files:")
    for line in mismatched_lines:
        print(line)
else:
    print("No differences found between the files.")
Counterglow answered 2/9, 2023 at 14:52 Comment(2)
Perhaps help us by writing how this code works and how it helps achieving the result.Catnip
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Organdy

© 2022 - 2024 — McMap. All rights reserved.