Comparing folders and content with PowerShell
Asked Answered
K

7

34

I have two different folders with xml files. One folder (folder2) contains updated and new xml files compared to the other (folder1). I need to know which files in folder2 are new/updated compared to folder1 and copy them to a third folder (folder3). What's the best way to accomplish this in PowerShell?

Kristlekristo answered 29/6, 2011 at 19:54 Comment(7)
You want to compare based on modified date or do you want to compare file contents?Summers
I want to compare based on file contents.Kristlekristo
And how do you want to handle files that don't exist in one or the other folder?Summers
If files do not exist in folder1 but do exist in folder2, then those files are new and I want to copy them to folder3. If files exist in folder1 but do not exist in folder2, then those are obsolete files so I do not want to copy them but do want to log them.Kristlekristo
And how do you know a file is new based on contents? If the file is in both folder1 and folder2 but the contents are different, then take the one from folder2, otherwise ignore if they're the same?Alphitomancy
I will know that a file is new only based on the fact that it is in folder2 and not in folder1 as folder2 is an updated version of folder1. And yes, if the same file is in both folder1 and folder2, but the content of that file is different in folder2, then copy it, otherwise ignore if the files are the same.Kristlekristo
How do you want to handle files with duplicate contents, but different file names?Downstairs
S
47

OK, I'm not going to code the whole thing for you (what's the fun in that?) but I'll get you started.

First, there are two ways to do the content comparison. The lazy/mostly right way, which is comparing the length of the files; and the accurate but more involved way, which is comparing a hash of the contents of each file.

For simplicity sake, let's do the easy way and compare file size.

Basically, you want two objects that represent the source and target folders:

$Folder1 = Get-childitem "C:\Folder1"
$Folder2 = Get-childitem  "C:\Folder2"

Then you can use Compare-Object to see which items are different...

Compare-Object $Folder1 $Folder2 -Property Name, Length

which will list for you everything that is different by comparing only name and length of the file objects in each collection.

You can pipe that to a Where-Object filter to pick stuff that is different on the left side...

Compare-Object $Folder1 $Folder2 -Property Name, Length | Where-Object {$_.SideIndicator -eq "<="}

And then pipe that to a ForEach-Object to copy where you want:

Compare-Object $Folder1 $Folder2 -Property Name, Length  | Where-Object {$_.SideIndicator -eq "<="} | ForEach-Object {
        Copy-Item "C:\Folder1\$($_.name)" -Destination "C:\Folder3" -Force
        }
Summers answered 29/6, 2011 at 20:34 Comment(6)
Thanks, JNK! I had to reverse the side indicator to the other side (=>) but it seems to work for me: Compare-Object $Folder1 $Folder2 -Property Name, Length | Where-Object {$_.SideIndicator -eq "=>"} | ForEach-Object {Copy-Item "$Folder2Path\$($_.name)" -Destination $Folder3 -Force}Kristlekristo
@keith - you could also reverse the order of the folders if you wanted. If you want to do separate logic for the other files (the ones that exist in folder2 but not folder1) then you can just use -property name and reverse the side indicator, then log them or whatever you want.Summers
The script above seems to work fine with a flat folder structure, but doesn't seem to work when dealing with folders with sub-directories with files. Any ideas?Kristlekristo
You can make a special case using Get-Childitem "C:\Folder2" | where-object {$_.length -eq $null} to get the folders only. You can then pipe that elsewhere. If you need to do a lot of this it may make sense to make it a function that you feed folders, then you can make it recursive to deal with subfolders automagically.Summers
@keith - if you need further expansion/help, don't hesitate to open a new question. You can link back here for a starting point.Summers
@Kristlekristo you may be able to add -recurse to the Get-childitem callsAphasic
A
17

Recursive Directory Diff Using MD5 Hashing (Compares Content)

Here is a pure PowerShell v3+ recursive file diff (no dependencies) that calculates MD5 hash for each directories file contents (left/right). Can optionally export CSV's along with a summary text file. Default outputs results to stdout. Can either drop the rdiff.ps1 file into your path or copy the contents into your script.

USAGE: rdiff path/to/left,path/to/right [-s path/to/summary/dir]

Here is the gist. Recommended to use version from gist as it may have additional features over time. Feel free to send pull requests.

#########################################################################
### USAGE: rdiff path/to/left,path/to/right [-s path/to/summary/dir]  ###
### ADD LOCATION OF THIS SCRIPT TO PATH                               ###
#########################################################################
[CmdletBinding()]
param (
  [parameter(HelpMessage="Stores the execution working directory.")]
  [string]$ExecutionDirectory=$PWD,

  [parameter(Position=0,HelpMessage="Compare two directories recursively for differences.")]
  [alias("c")]
  [string[]]$Compare,

  [parameter(HelpMessage="Export a summary to path.")]
  [alias("s")]
  [string]$ExportSummary
)

### FUNCTION DEFINITIONS ###

# SETS WORKING DIRECTORY FOR .NET #
function SetWorkDir($PathName, $TestPath) {
  $AbsPath = NormalizePath $PathName $TestPath
  Set-Location $AbsPath
  [System.IO.Directory]::SetCurrentDirectory($AbsPath)
}

# RESTORES THE EXECUTION WORKING DIRECTORY AND EXITS #
function SafeExit() {
  SetWorkDir /path/to/execution/directory $ExecutionDirectory
  Exit
}

function Print {
  [CmdletBinding()]
  param (
    [parameter(Mandatory=$TRUE,Position=0,HelpMessage="Message to print.")]
    [string]$Message,

    [parameter(HelpMessage="Specifies a success.")]
    [alias("s")]
    [switch]$SuccessFlag,

    [parameter(HelpMessage="Specifies a warning.")]
    [alias("w")]
    [switch]$WarningFlag,

    [parameter(HelpMessage="Specifies an error.")]
    [alias("e")]
    [switch]$ErrorFlag,

    [parameter(HelpMessage="Specifies a fatal error.")]
    [alias("f")]
    [switch]$FatalFlag,

    [parameter(HelpMessage="Specifies a info message.")]
    [alias("i")]
    [switch]$InfoFlag = !$SuccessFlag -and !$WarningFlag -and !$ErrorFlag -and !$FatalFlag,

    [parameter(HelpMessage="Specifies blank lines to print before.")]
    [alias("b")]
    [int]$LinesBefore=0,

    [parameter(HelpMessage="Specifies blank lines to print after.")]
    [alias("a")]
    [int]$LinesAfter=0,

    [parameter(HelpMessage="Specifies if program should exit.")]
    [alias("x")]
    [switch]$ExitAfter
  )
  PROCESS {
    if($LinesBefore -ne 0) {
      foreach($i in 0..$LinesBefore) { Write-Host "" }
    }
    if($InfoFlag) { Write-Host "$Message" }
    if($SuccessFlag) { Write-Host "$Message" -ForegroundColor "Green" }
    if($WarningFlag) { Write-Host "$Message" -ForegroundColor "Orange" }
    if($ErrorFlag) { Write-Host "$Message" -ForegroundColor "Red" }
    if($FatalFlag) { Write-Host "$Message" -ForegroundColor "Red" -BackgroundColor "Black" }
    if($LinesAfter -ne 0) {
      foreach($i in 0..$LinesAfter) { Write-Host "" }
    }
    if($ExitAfter) { SafeExit }
  }
}

# VALIDATES STRING MIGHT BE A PATH #
function ValidatePath($PathName, $TestPath) {
  If([string]::IsNullOrWhiteSpace($TestPath)) {
    Print -x -f "$PathName is not a path"
  }
}

# NORMALIZES RELATIVE OR ABSOLUTE PATH TO ABSOLUTE PATH #
function NormalizePath($PathName, $TestPath) {
  ValidatePath "$PathName" "$TestPath"
  $TestPath = [System.IO.Path]::Combine((pwd).Path, $TestPath)
  $NormalizedPath = [System.IO.Path]::GetFullPath($TestPath)
  return $NormalizedPath
}


# VALIDATES STRING MIGHT BE A PATH AND RETURNS ABSOLUTE PATH #
function ResolvePath($PathName, $TestPath) {
  ValidatePath "$PathName" "$TestPath"
  $ResolvedPath = NormalizePath $PathName $TestPath
  return $ResolvedPath
}

# VALIDATES STRING RESOLVES TO A PATH AND RETURNS ABSOLUTE PATH #
function RequirePath($PathName, $TestPath, $PathType) {
  ValidatePath $PathName $TestPath
  If(!(Test-Path $TestPath -PathType $PathType)) {
    Print -x -f "$PathName ($TestPath) does not exist as a $PathType"
  }
  $ResolvedPath = Resolve-Path $TestPath
  return $ResolvedPath
}

# Like mkdir -p -> creates a directory recursively if it doesn't exist #
function MakeDirP {
  [CmdletBinding()]
  param (
    [parameter(Mandatory=$TRUE,Position=0,HelpMessage="Path create.")]
    [string]$Path
  )
  PROCESS {
    New-Item -path $Path -itemtype Directory -force | Out-Null
  }
}

# GETS ALL FILES IN A PATH RECURSIVELY #
function GetFiles {
  [CmdletBinding()]
  param (
    [parameter(Mandatory=$TRUE,Position=0,HelpMessage="Path to get files for.")]
    [string]$Path
  )
  PROCESS {
    ls $Path -r | where { !$_.PSIsContainer }
  }
}

# GETS ALL FILES WITH CALCULATED HASH PROPERTY RELATIVE TO A ROOT DIRECTORY RECURSIVELY #
# RETURNS LIST OF @{RelativePath, Hash, FullName}
function GetFilesWithHash {
  [CmdletBinding()]
  param (
    [parameter(Mandatory=$TRUE,Position=0,HelpMessage="Path to get directories for.")]
    [string]$Path,

    [parameter(HelpMessage="The hash algorithm to use.")]
    [string]$Algorithm="MD5"
  )
  PROCESS {
    $OriginalPath = $PWD
    SetWorkDir path/to/diff $Path
    GetFiles $Path | select @{N="RelativePath";E={$_.FullName | Resolve-Path -Relative}},
                            @{N="Hash";E={(Get-FileHash $_.FullName -Algorithm $Algorithm | select Hash).Hash}},
                            FullName
    SetWorkDir path/to/original $OriginalPath
  }
}

# COMPARE TWO DIRECTORIES RECURSIVELY #
# RETURNS LIST OF @{RelativePath, Hash, FullName}
function DiffDirectories {
  [CmdletBinding()]
  param (
    [parameter(Mandatory=$TRUE,Position=0,HelpMessage="Directory to compare left.")]
    [alias("l")]
    [string]$LeftPath,

    [parameter(Mandatory=$TRUE,Position=1,HelpMessage="Directory to compare right.")]
    [alias("r")]
    [string]$RightPath
  )
  PROCESS {
    $LeftHash = GetFilesWithHash $LeftPath
    $RightHash = GetFilesWithHash $RightPath
    diff -ReferenceObject $LeftHash -DifferenceObject $RightHash -Property RelativePath,Hash
  }
}

### END FUNCTION DEFINITIONS ###

### PROGRAM LOGIC ###

if($Compare.length -ne 2) {
  Print -x "Compare requires passing exactly 2 path parameters separated by comma, you passed $($Compare.length)." -f
}
Print "Comparing $($Compare[0]) to $($Compare[1])..." -a 1
$LeftPath   = RequirePath path/to/left $Compare[0] container
$RightPath  = RequirePath path/to/right $Compare[1] container
$Diff       = DiffDirectories $LeftPath $RightPath
$LeftDiff   = $Diff | where {$_.SideIndicator -eq "<="} | select RelativePath,Hash
$RightDiff   = $Diff | where {$_.SideIndicator -eq "=>"} | select RelativePath,Hash
if($ExportSummary) {
  $ExportSummary = ResolvePath path/to/summary/dir $ExportSummary
  MakeDirP $ExportSummary
  $SummaryPath = Join-Path $ExportSummary summary.txt
  $LeftCsvPath = Join-Path $ExportSummary left.csv
  $RightCsvPath = Join-Path $ExportSummary right.csv

  $LeftMeasure = $LeftDiff | measure
  $RightMeasure = $RightDiff | measure

  "== DIFF SUMMARY ==" > $SummaryPath
  "" >> $SummaryPath
  "-- DIRECTORIES --" >> $SummaryPath
  "`tLEFT -> $LeftPath" >> $SummaryPath
  "`tRIGHT -> $RightPath" >> $SummaryPath
  "" >> $SummaryPath
  "-- DIFF COUNT --" >> $SummaryPath
  "`tLEFT -> $($LeftMeasure.Count)" >> $SummaryPath
  "`tRIGHT -> $($RightMeasure.Count)" >> $SummaryPath
  "" >> $SummaryPath
  $Diff | Format-Table >> $SummaryPath

  $LeftDiff | Export-Csv $LeftCsvPath -f
  $RightDiff | Export-Csv $RightCsvPath -f
}
$Diff
SafeExit
Artemus answered 24/9, 2015 at 20:0 Comment(0)
S
7

Further to @JNK's answer, you might want to ensure that you are always working with files rather than the less-intuitive output from Compare-Object. You just need to use the -PassThru switch...

$Folder1 = Get-ChildItem "C:\Folder1"
$Folder2 = Get-ChildItem "C:\Folder2"
$Folder2 = "C:\Folder3\"

# Get all differences, i.e. from both "sides"
$AllDiffs = Compare-Object $Folder1 $Folder2 -Property Name,Length -PassThru

# Filter for new/updated files from $Folder2
$Changes = $AllDiffs | Where-Object {$_.Directory.Fullname -eq $Folder2}

# Copy to $Folder3
$Changes | Copy-Item -Destination $Folder3

This at least means you don't have to worry about which way the SideIndicator arrow points!

Also, bear in mind that you might want to compare on LastWriteTime as well.

Sub-folders

Looping through the sub-folders recursively is a little more complicated as you probably will need to strip off the respective root folder paths from the FullName field before comparing lists.

You could do this by adding a new ScriptProperty to your Folder1 and Folder2 lists:

$Folder1 | Add-Member -MemberType ScriptProperty -Name "RelativePath" `
  -Value {$this.FullName -replace [Regex]::Escape("C:\Folder1"),""}

$Folder2 | Add-Member -MemberType ScriptProperty -Name "RelativePath" `
  -Value {$this.FullName -replace [Regex]::Escape("C:\Folder2"),""}

You should then be able to use RelativePath as a property when comparing the two objects and also use that to join on to "C:\Folder3" when copying to keep the folder structure in place.

Suez answered 29/3, 2016 at 17:53 Comment(1)
$Folder2 = "C:\Folder3\" this could be $Folder3 = "C:\Folder3\"Jaquenette
D
3

Here's an approach which will find files which are missing or differ in content.

First, a quick-and-dirty one-liner (see caveat below).

dir -r | rvpa -Relative |%{ if (Test-Path $right\$_) { if (Test-Path -Type Leaf $_) { if ( diff (cat $_) (cat $right\$_ ) ) { $_ } } } else { $_ } }

Run the above in one of the directories, with $right set to (or replaced with) the path to the other directory. Things missing from $right, or which differ in content, will be reported. No output means no differences found. CAVEAT: Things existing in $right but missing from the left will not be found/reported.

This doesn't bother calculating hashes; it just compares the file contents directly. Hashing makes sense when you want to reference something in another context (later date, on another machine, etc.), but when we're comparing things directly, it adds nothing but overhead. (It's also theoretically possible for two files to have the same hash, although that's basically impossible to happen by accident. Deliberate attack, on the other hand...)

Here's a more proper script, which handles more corner cases and errors.

[CmdletBinding()]
Param(
    [Parameter(Mandatory=$true,Position=0)][string]$Left,
    [Parameter(Mandatory=$True,Position=1)][string]$Right
    )

# throw errors on undefined variables
Set-StrictMode -Version 1

# stop immediately on error
$ErrorActionPreference = [System.Management.Automation.ActionPreference]::Stop

# init counters
$Items = $MissingRight = $MissingLeft = $Contentdiff = 0

# make sure the given parameters are valid paths
$left  = Resolve-Path $left
$right = Resolve-Path $right

# make sure the given parameters are directories
if (-Not (Test-Path -Type Container $left))  { throw "not a container: $left"  }
if (-Not (Test-Path -Type Container $right)) { throw "not a container: $right" }

# Starting from $left as relative root, walk the tree and compare to $right.
Push-Location $left

try {
    Get-ChildItem -Recurse | Resolve-Path -Relative | ForEach-Object {
        $rel = $_
        
        $Items++
        
        # make sure counterpart exists on the other side
        if (-not (Test-Path $right\$rel)) {
            Write-Output "missing from right: $rel"
            $MissingRight++
            return
            }
    
        # compare contents for files (directories just have to exist)
        if (Test-Path -Type Leaf $rel) {
            if ( Compare-Object (Get-Content $left\$rel) (Get-Content $right\$rel) ) {
                Write-Output "content differs   : $rel"
                $ContentDiff++
                }
            }
        }
    }
finally {
    Pop-Location
    }

# Check items in $right for counterparts in $left.
# Something missing from $left of course won't be found when walking $left.
# Don't need to check content again here.

Push-Location $right

try {
    Get-ChildItem -Recurse | Resolve-Path -Relative | ForEach-Object {
        $rel = $_
        
        if (-not (Test-Path $left\$rel)) {
            Write-Output "missing from left : $rel"
            $MissingLeft++
            return
            }
        }
    }
finally {
    Pop-Location
    }

Write-Verbose "$Items items, $ContentDiff differed, $MissingLeft missing from left, $MissingRight from right"
Dives answered 23/12, 2020 at 22:26 Comment(0)
P
0

Do this:

compare (Get-ChildItem D:\MyFolder\NewFolder) (Get-ChildItem \\RemoteServer\MyFolder\NewFolder)

And even recursively:

compare (Get-ChildItem -r D:\MyFolder\NewFolder) (Get-ChildItem -r \\RemoteServer\MyFolder\NewFolder)

and is even hard to forget :)

Porthole answered 26/9, 2018 at 23:38 Comment(1)
I don't think compare-object automatically compares all the properties without specifying them.Eagre
B
-1

Handy version using script parameter

Simple file-level comparasion

Call it like PS > .\DirDiff.ps1 -a .\Old\ -b .\New\

Param(
  [string]$a,
  [string]$b
)

$fsa = Get-ChildItem -Recurse -path $a
$fsb = Get-ChildItem -Recurse -path $b
Compare-Object -Referenceobject $fsa -DifferenceObject $fsb

Possible output:

InputObject                  SideIndicator
-----------                  -------------
appsettings.Development.json <=
appsettings.Testing.json     <=
Server.pdb                   =>
ServerClientLibrary.pdb      =>
Birdcage answered 18/1, 2018 at 16:45 Comment(1)
this doesn't answer the question, doesn't compare file contentsEntomologize
C
-2

gci -path 'C:\Folder' -recurse |where{$_.PSIsContainer}

-recurse will explore all subtrees below the root path given and the .PSIsContainer property is the one you want to test for to grab all folders only. You can use where{!$_.PSIsContainer} for just files.

Comptom answered 2/5, 2013 at 17:15 Comment(1)
Looks like this may be an answer to a question posed in the comments above, but doesn't give enough context.Suez

© 2022 - 2024 — McMap. All rights reserved.