How to compare two csv files in windows
Asked Answered
T

6

21

I need to compare two csv files in windows7. How can I proceed to achieve this. I want to see the differences in both the files , like we can use tkdiff in Linux.

Transverse answered 4/11, 2015 at 13:38 Comment(4)
yup , I tried Google but did not find any satisfactory answers , That's why I asked here :PTransverse
You can also use notepad++ 's compare plugin.Lasko
Comparing 2 csvs is actually quite hard as you have to escape each field correctly and also you are likely to want to compare each line based on a key and not line by line which generic text comparisons do.Averir
zsv has a compare feature that is fast, open-source and cross-platform (disclaimer: I'm one of the authors): github.com/liquidaty/zsv/blob/main/docs/compare.mdBosquet
S
27

Suggestion:

  • Press Windows+R shortcut to open windows' Run prompt
  • Type in cmd and press Enter to open a DOS terminal cmd window
  • Change the current path by running the command cd C:\path\to\your\directory to reach the location of the two CSV files

Tip: To paste a copied path from clipboard into DOS terminal cmd window, you can either (1) right-click on terminal window, or (2) press Shift+Insert.

  • Finally, to compare the two files, run fc filename1.csv filename2.csv > outfile.txt (fc stands for "file compare").
    The command will also log the result of comparison into a text file outfile.txt located in the same folder. If outfile.txt doesn't exist, it will be created automatically.
Secondhand answered 4/11, 2015 at 13:50 Comment(1)
fc is a horrific file comparison tool. Try it yourself: Create two identical csv files and save them, and then add a new first line to one of them, and see what happens.Droll
K
8

Here is another option which I found very useful, as mentioned here:

findstr /v /g:"file1.csv" "file2.csv"

Where the /v switch returns the differences and /g: gets the search strings from file1.csv. You can use findstr /? for more help.

You can also print the differences to a file using:

findstr /v /g:"file1.csv" "file2.csv > diffs.csv"

As an aside, I found findstr far more accurate and the output more readable than fc.


UPDATE
This works nicely with 'smaller' files. You might get an out of memory error on larger files. In this case, I've had to turn to Python and dataframes. Just a friendly heads up ...

Knuckle answered 9/8, 2017 at 12:50 Comment(0)
M
3

I did this today.

Lets say we have 2 csv files X and Y

X having columns a, b, c
Y having column a, b, c

The rows are not in same order and are disperesed througout the csv files.

I imported both of them in my excel sheet. I sorted them first by column c and then by column b and then by column a. You can go in any order you like.

Compare the sorted files through notepad++'s compare plugin/Beyond Compare.

Mala answered 12/2, 2018 at 6:35 Comment(0)
K
1

I had this issue and for fun I tried writing some powershell code to generate a "per column" diff. It is embarrassingly slow, but it solved my problem nicer (very wide CSV file, so regular diff was really hard to interpret). Call Compare-Csv with the files you need

$ErrorActionPreference = "Stop"

function Compare-Csv
(
    [Parameter(Mandatory)] [string] $ReferenceFile,
    [Parameter(Mandatory)] [string] $DifferenceFile,
    [string[]] $ReferenceIdentifiers = $null,
    [char] $Delimiter = ';'
)
{
    $referenceData = Import-Csv -ErrorAction 'Stop' -Delimiter $Delimiter $ReferenceFile
    $differenceData = Import-Csv -ErrorAction 'Stop' -Delimiter $Delimiter $DifferenceFile
    $referenceDataHeaders = [string[]] ($referenceData | Select-Object -First 1 | Get-Member -MemberType Properties | Select-Object -ExpandProperty Name)
    $differenceDataHeaders = [string[]] ($differenceData | Select-Object -First 1 | Get-Member -MemberType Properties | Select-Object -ExpandProperty Name)
    $supersetHeaders = ($referenceDataHeaders + $differenceDataHeaders) | Select-Object -Unique

    $empty = @()
    $fileDifferences = @()
    $maxLength = ($referenceData.Length, $differenceData.Length | Measure-Object -Maximum).Maximum
    for($i = 0; $i -lt $maxLength; $i++)
    {
        $ref = $empty;
        if($i -lt $referenceData.Length)
        {
            $ref = $referenceData[$i]
        }

        $diff = $empty;
        if($i -lt $differenceData.Length)
        {
            $diff = $differenceData[$i]
        }

        $rowDifferences = $null
        foreach($header in $supersetHeaders)
        {
            $compare = Compare-Object -ReferenceObject $ref -DifferenceObject $diff -Property $header
            if($compare)
            {
                if(-not $rowDifferences)
                {
                    $rowDifferences = @{}
                    if($ReferenceIdentifiers)
                    {
                        $identifer = ($ref | Select-Object -Property $ReferenceIdentifiers).PSObject.Properties.Value
                        $rowDifferences.Add('ReferenceIdentifiers', $identifer)
                    }
                }

                $rowDifferences.Add($header, $compare)
            }
        }

        if($rowDifferences)
        {
            $fileDifferences + $rowDifferences
        }
    }

    return $fileDifferences
}

$differences = Compare-Csv -ReferenceFile 'Ref.csv' -DifferenceFile 'Diff.csv' -ReferenceIdentifiers @('ARRAY OF HEADER NAMES USED TAKEN FROM REFERENCE FILE THAT CAN BE USED TO IDENTIFY THE ROW')
foreach($difference in $differences)
{
    $out = $difference.ReferenceIdentifiers + ": " + ($difference | Select-Object -ExcludeProperty ReferenceIdentifiers | Format-List | Out-String -NoNewline)
    Write-Host ""
    Write-Host $out
}
Kaenel answered 7/10, 2021 at 20:16 Comment(0)
V
0

You can also do the same thing with Powershell, which can be useful if you're in a read-only folder.

compare-object -ReferenceObject (Get-Content filename1.CSV) -DifferenceObject (Get-content filename2.CSV)

You can launch Powershell directly in your folder without navigating via cd, by typing powershell in the folder's address bar.

Vista answered 6/5, 2021 at 2:40 Comment(0)
S
0

Notepad++ will give you a quick and dirty solution with the Compare plugin:

Comparing two CSV files with Notepad++

How to do it:

  • Plugins -> Plugins Admin... -> Find "Compare" and click "Install".
  • Open both CSV files.
  • Select the first file, then Plugins -> Compare -> Set as First Compare.
  • Select the second file, then Plugins -> Compare -> Compare.

Voila!

If you want to compare two Excel files you can export the worksheets as CSV and do the above.

I did this to compare two HUGE CSV files, both around 400,000 rows and it worked like a charm.

Stank answered 26/7, 2024 at 18:13 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.