I need to compare two csv files in windows7. How can I proceed to achieve this. I want to see the differences in both the files , like we can use tkdiff in Linux.
Suggestion:
- Press Windows+R shortcut to open windows' Run prompt
- Type in
cmd
and press Enter to open aDOS terminalcmd window - Change the current path by running the command
cd C:\path\to\your\directory
to reach the location of the two CSV files
Tip: To paste a copied path from clipboard into DOS terminal cmd window, you can either (1) right-click on terminal window, or (2) press Shift+Insert.
- Finally, to compare the two files, run
fc filename1.csv filename2.csv > outfile.txt
(fc
stands for "file compare").
The command will also log the result of comparison into a text fileoutfile.txt
located in the same folder. Ifoutfile.txt
doesn't exist, it will be created automatically.
fc
is a horrific file comparison tool. Try it yourself: Create two identical csv files and save them, and then add a new first line to one of them, and see what happens. –
Droll Here is another option which I found very useful, as mentioned here:
findstr /v /g:"file1.csv" "file2.csv"
Where the /v
switch returns the differences and /g:
gets the search strings from file1.csv. You can use findstr /?
for more help.
You can also print the differences to a file using:
findstr /v /g:"file1.csv" "file2.csv > diffs.csv"
As an aside, I found findstr
far more accurate and the output more readable than fc
.
UPDATE
This works nicely with 'smaller' files. You might get an out of memory
error on larger files. In this case, I've had to turn to Python and dataframes. Just a friendly heads up ...
I did this today.
Lets say we have 2 csv files X and Y
X having columns a, b, c
Y having column a, b, c
The rows are not in same order and are disperesed througout the csv files.
I imported both of them in my excel sheet. I sorted them first by column c and then by column b and then by column a. You can go in any order you like.
Compare the sorted files through notepad++'s compare plugin/Beyond Compare.
I had this issue and for fun I tried writing some powershell code to generate a "per column" diff. It is embarrassingly slow, but it solved my problem nicer (very wide CSV file, so regular diff was really hard to interpret). Call Compare-Csv
with the files you need
$ErrorActionPreference = "Stop"
function Compare-Csv
(
[Parameter(Mandatory)] [string] $ReferenceFile,
[Parameter(Mandatory)] [string] $DifferenceFile,
[string[]] $ReferenceIdentifiers = $null,
[char] $Delimiter = ';'
)
{
$referenceData = Import-Csv -ErrorAction 'Stop' -Delimiter $Delimiter $ReferenceFile
$differenceData = Import-Csv -ErrorAction 'Stop' -Delimiter $Delimiter $DifferenceFile
$referenceDataHeaders = [string[]] ($referenceData | Select-Object -First 1 | Get-Member -MemberType Properties | Select-Object -ExpandProperty Name)
$differenceDataHeaders = [string[]] ($differenceData | Select-Object -First 1 | Get-Member -MemberType Properties | Select-Object -ExpandProperty Name)
$supersetHeaders = ($referenceDataHeaders + $differenceDataHeaders) | Select-Object -Unique
$empty = @()
$fileDifferences = @()
$maxLength = ($referenceData.Length, $differenceData.Length | Measure-Object -Maximum).Maximum
for($i = 0; $i -lt $maxLength; $i++)
{
$ref = $empty;
if($i -lt $referenceData.Length)
{
$ref = $referenceData[$i]
}
$diff = $empty;
if($i -lt $differenceData.Length)
{
$diff = $differenceData[$i]
}
$rowDifferences = $null
foreach($header in $supersetHeaders)
{
$compare = Compare-Object -ReferenceObject $ref -DifferenceObject $diff -Property $header
if($compare)
{
if(-not $rowDifferences)
{
$rowDifferences = @{}
if($ReferenceIdentifiers)
{
$identifer = ($ref | Select-Object -Property $ReferenceIdentifiers).PSObject.Properties.Value
$rowDifferences.Add('ReferenceIdentifiers', $identifer)
}
}
$rowDifferences.Add($header, $compare)
}
}
if($rowDifferences)
{
$fileDifferences + $rowDifferences
}
}
return $fileDifferences
}
$differences = Compare-Csv -ReferenceFile 'Ref.csv' -DifferenceFile 'Diff.csv' -ReferenceIdentifiers @('ARRAY OF HEADER NAMES USED TAKEN FROM REFERENCE FILE THAT CAN BE USED TO IDENTIFY THE ROW')
foreach($difference in $differences)
{
$out = $difference.ReferenceIdentifiers + ": " + ($difference | Select-Object -ExcludeProperty ReferenceIdentifiers | Format-List | Out-String -NoNewline)
Write-Host ""
Write-Host $out
}
You can also do the same thing with Powershell, which can be useful if you're in a read-only folder.
compare-object -ReferenceObject (Get-Content filename1.CSV) -DifferenceObject (Get-content filename2.CSV)
You can launch Powershell directly in your folder without navigating via cd, by typing powershell in the folder's address bar.
Notepad++ will give you a quick and dirty solution with the Compare plugin:
How to do it:
- Plugins -> Plugins Admin... -> Find "Compare" and click "Install".
- Open both CSV files.
- Select the first file, then Plugins -> Compare -> Set as First Compare.
- Select the second file, then Plugins -> Compare -> Compare.
Voila!
If you want to compare two Excel files you can export the worksheets as CSV and do the above.
I did this to compare two HUGE CSV files, both around 400,000 rows and it worked like a charm.
© 2022 - 2025 — McMap. All rights reserved.
zsv
has acompare
feature that is fast, open-source and cross-platform (disclaimer: I'm one of the authors): github.com/liquidaty/zsv/blob/main/docs/compare.md – Bosquet