powershell binary file comparison
Asked Answered
J

5

13

All, There is a application which generates it's export dumps.I need to write a script that will compare the previous days dump against the latest and if there are differences among them i have to some basic manipulations of moving and deleting sort of stuff. I have tried finding a suitable way of doing it and the method i tried was : $var_com=diff (get-content D:\local\prodexport2 -encoding Byte) (get-content D:\local\prodexport2 -encoding Byte) I tried the Compare-Object cmdlet as well. I notice a very high memory usage and eventually i get a message System.OutOfMemoryException after few minutes. Has one of you done something similer ?. Some thoughts please. There was a thread which mentioned about a has comparison which i have no idea as to how to go about. Thanks in advance folks Osp

Janice answered 14/11, 2013 at 23:47 Comment(4)
Do you need to know which bytes are different, or just that today's file is not the same as yesterdays?Plan
just need to know if they are different. As you have quoted i need to know if the files are the same or not.Janice
Have a look at the answers here. It's marked C# but since it's .NET, it can be ported to PowerShell syntax. The easiest thing to do is compare file sizes first - if those are different, you already have your answer.Viol
If use -Raw parameter of Get-Content without any -Encoding, comparison wents much faster and easier.Babylonia
P
8

Another method is to compare the MD5 hashes of the files:

$Filepath1 = 'c:\testfiles\testfile.txt'
$Filepath2 = 'c:\testfiles\testfile1.txt'

$hashes = 
foreach ($Filepath in $Filepath1,$Filepath2)
{
 $MD5 = [Security.Cryptography.HashAlgorithm]::Create( "MD5" )
 $stream = ([IO.StreamReader]"$Filepath").BaseStream
 -join ($MD5.ComputeHash($stream) | 
 ForEach { "{0:x2}" -f $_ })
 $stream.Close()
 }

if ($hashes[0] -eq $hashes[1])
  {'Files Match'}
Plan answered 15/11, 2013 at 1:19 Comment(3)
Thanks for this. It took away the long time it used to take for the comparison.Janice
I tried using this code with relative paths (so in Powershell cd somewhere and then $FilePath1 = 'testfile.txt') but the StreamReader doesn't pick up Powershell's change of folder and thinks it is relative to my home folder instead. The fix is to use $Filepath1 = Get-Item 'testfile.txt' instead and then Powershell passes the correct absolute path to StreamReader.Rugging
Powershell's Get-FileHash function is (now) available, and does the same thing more simply.Floatplane
I
27

With PowerShell 4 you can use native commandlets to do this:

function CompareFiles {
    param(
    [string]$Filepath1,
    [string]$Filepath2
    )
    if ((Get-FileHash $Filepath1).Hash -eq (Get-FileHash $Filepath2).Hash) {
        Write-Host 'Files Match' -ForegroundColor Green
    } else {
        Write-Host 'Files do not match' -ForegroundColor Red
    }
}

PS C:> CompareFiles .\20131104.csv .\20131104-copy.csv

Files Match

PS C:> CompareFiles .\20131104.csv .\20131107.csv

Files do not match

You could easily modify the above function to return a $true or $false value if you want to use this programmatically on a large scale


EDIT

After seeing this answer, I just wanted to supply larger scale version that simply returns true or false:

function CompareFiles 
{
    param
    (
        [parameter(
            Mandatory = $true,
            HelpMessage = "Specifies the 1st file to compare. Make sure it's an absolute path with the file name and its extension."
        )]
        [string]
        $file1,

        [parameter(
            Mandatory = $true,
            HelpMessage = "Specifies the 2nd file to compare. Make sure it's an absolute path with the file name and its extension."
        )]
        [string]
        $file2
    )

    ( Get-FileHash $file1 ).Hash -eq ( Get-FileHash $file2 ).Hash
}
Incendiarism answered 15/7, 2014 at 18:8 Comment(0)
G
13

You could use fc.exe. It comes with Windows. Here's how you would use it:

fc.exe /b d:\local\prodexport2 d:\local\prodexport1 > $null
if (!$?) {
    "The files are different"
}
Guido answered 15/11, 2013 at 0:28 Comment(3)
I might be inclined to not use the if (!$?) and replace it with if ($LastExitCode -eq 0). Check out https://mcmap.net/q/16008/-lastexitcode-0-but-false-in-powershell-redirecting-stderr-to-stdout-gives-nativecommanderror and all the answers.Lette
This is extremely slow for different files, because it prints all differences (to null). It seems fc does not support not printing output. One can use 'fc /a /b ' which might try to output less but didn't make big difference for me.Theophrastus
Just out of curiosity does it help to assign to $null e.g. $null = fc.exe ...?Guido
P
8

Another method is to compare the MD5 hashes of the files:

$Filepath1 = 'c:\testfiles\testfile.txt'
$Filepath2 = 'c:\testfiles\testfile1.txt'

$hashes = 
foreach ($Filepath in $Filepath1,$Filepath2)
{
 $MD5 = [Security.Cryptography.HashAlgorithm]::Create( "MD5" )
 $stream = ([IO.StreamReader]"$Filepath").BaseStream
 -join ($MD5.ComputeHash($stream) | 
 ForEach { "{0:x2}" -f $_ })
 $stream.Close()
 }

if ($hashes[0] -eq $hashes[1])
  {'Files Match'}
Plan answered 15/11, 2013 at 1:19 Comment(3)
Thanks for this. It took away the long time it used to take for the comparison.Janice
I tried using this code with relative paths (so in Powershell cd somewhere and then $FilePath1 = 'testfile.txt') but the StreamReader doesn't pick up Powershell's change of folder and thinks it is relative to my home folder instead. The fix is to use $Filepath1 = Get-Item 'testfile.txt' instead and then Powershell passes the correct absolute path to StreamReader.Rugging
Powershell's Get-FileHash function is (now) available, and does the same thing more simply.Floatplane
D
8

A while back I wrote an article on a buffered comparison routine to compare two files with PowerShell:

function FilesAreEqual {
    param(
        [System.IO.FileInfo] $first,
        [System.IO.FileInfo] $second, 
        [uint32] $bufferSize = 524288) 

    if ($first.Length -ne $second.Length) return $false

    if ( $bufferSize -eq 0 ) $bufferSize = 524288

    $fs1 = $first.OpenRead()
    $fs2 = $second.OpenRead()

    $one = New-Object byte[] $bufferSize
    $two = New-Object byte[] $bufferSize
    $equal = $true

    do {
        $bytesRead = $fs1.Read($one, 0, $bufferSize)
        $fs2.Read($two, 0, $bufferSize) | out-null

        if ( -Not [System.Linq.Enumerable]::SequenceEqual($one, $two)) {
            $equal = $false
        }

    } while ($equal -and $bytesRead -eq $bufferSize)

    $fs1.Close()
    $fs2.Close()

    return $equal
}

You can use it by:

FilesAreEqual c:\temp\test.html c:\temp\test.html

A hash (like MD5) needs to traverse the entire file to do the hash calculation. This script returns as soon at it sees a difference in the buffer. It compares the buffer using LINQ which is faster than native PowerShell.

Doable answered 2/4, 2014 at 2:55 Comment(9)
How would your routine compare with the @ericnils answer with respect to performance? When using it inside a function that could get called from a foreach that contains however many files of varying sizes, is yours more optimized than the 4.0 Get-FileHash?Lette
@CodeMaverick, it should be for exactly the reason he stated. it doesn't have to read both entire files unless they are the same. It's the ideal solutionShelf
I suggest setting $BYTES_TO_READ to some higher value than 8. On my system reading 8 Bytes per iteration was extremely slow. I don't know what the best value is, but increasing the buffer size to 32768 (32 KB) certainly made the file compare a lot snappier.Keening
I realized that changing $BYTES_TO_READ is not enough, because inside the loop the BitConverter calls only compare the first 8 Bytes (= one Int64) of the buffer. After some deliberation I settled for a second, inner loop that iterates over the byte arrays and individually compares every byte. This is reasonably fast, and it's especially much faster than the ultra-slow compare-object cmdlet.Keening
Unfortunately as herzbube notes, the current implementation gives completely wrong answers because only 8 bytes out of every 32768 are actually compared.Objectionable
Very interesting, is the version with the int64 problem solved?Idelia
Added a buffer, as read by buffer is way faster. Updated the original blog article as well.Doable
Apologies for necroing, but I think there’s a (theoretical, if not practical) bug in the use of Read - the docs say “An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached.” (see learn.microsoft.com/en-us/dotnet/api/…) so $fs1.Read(…) and $fs2.Read(…) could read different byte counts. It doesn’t seem to ever actually happen in practice, but it’s possible. An assert for, e.g. $bytesRead1 -eq $bytesRead2 inside the loop would at least protect against this…Confraternity
@KeesC.Bakker - Nine years later, I've used your code as the basis for a function that I tested against other methods of binary file comparison in PowerShell, and found your method to be the fastest: #76896489Guthrie
H
2
if ( (Get-FileHash c:\testfiles\testfile1.txt).Hash -eq (Get-FileHash c:\testfiles\testfile2.txt).Hash ) {
   Write-Output "Files match"
} else {
   Write-Output "Files do not match"
}
Homogenous answered 31/1, 2020 at 8:32 Comment(1)
Hi and welcome to stackoverflow, and thank you for answering. While this code might answer the question, can you consider adding some explanation for what the problem was you solved, and how you solved it? This will help future readers to understand your answer better and learn from it.Orelee

© 2022 - 2024 — McMap. All rights reserved.