Powershell Copy-Item but only copy changed files
Asked Answered
T

5

32

I am trying to recurse through a directory and copy it from A to B. That can be done with the following:

Copy-Item C:\MyTest C:\MyTest2 –recurse

I want to be able though to only copy new files (ones that exist in src but not dest) and also only copy files that may have changed based off a CRC check and not a datetime stamp.

$file = "c:\scripts"
param
(
$file
)

$algo = [System.Security.Cryptography.HashAlgorithm]::Create("MD5")
$stream = New-Object System.IO.FileStream($file, [System.IO.FileMode]::Open)

$md5StringBuilder = New-Object System.Text.StringBuilder
$algo.ComputeHash($stream) | `
% { [void] $md5StringBuilder.Append($_.ToString("x2")) }
$md5StringBuilder.ToString()

$stream.Dispose() 

This code gives me a CRC check on a specific file...I am just not sure how to put the two scripts together to really give me what I need. I also don't know if the CRC check above is actually the correct way of doing this.

Does anyone have any insight?

Tamica answered 24/3, 2009 at 15:0 Comment(1)
My first question would be have you looked at just using Robocopy? You are really reinventing a very well designed wheel here.Karakorum
S
40

Both of those are solid answers for powershell, but it would probably be far more easy to just leverage Robocopy (MS Supplied robust copy application).

robocopy "C:\SourceDir\" "C:\DestDir\" /MIR

would accomplish the same thing.

Scoles answered 25/3, 2011 at 13:32 Comment(4)
robocopy doesn't compare content as far as I can tell. It relies on sizes and date stamps.Bilingual
I'm surprised that nobody has commented on the fact that this does not do what the OP asked for. This will copy changed files, but it will also delete files in the destination that are not in the source. That is quite a dangerous side-effect. /E should be enough.Phonogram
I'm wrong, /E will skip existing files, but apparently won't look to see if they have changed.Phonogram
robocopy will do mirroring which means that it will delete all file that doesn't exists in the source folder, this is different from what the original question wasLecher
G
10

Here is some of the guidelines how you can your script to be more maintainable.

Conver the original script as a filter.

filter HasChanged { 
    param($file)

    # if $file's MD5 has does not exist
    # then return $_
}

Then simply filter all files that are updated and copy them.

# Note that "Copy-Item" here does not preserve original directory structure
# Every updated file gets copied right under "C:\MyTest2"
ls C:\MyTest -Recurse | HasChanged | Copy-Item -Path {$_} C:\MyTest2

Or you can create another function that generates sub directory

ls C:\MyTest -Recurse | HasChanged | % { Copy-Item $_ GenerateSubDirectory(...) }
Glycerol answered 24/3, 2009 at 16:18 Comment(0)
T
6

I found a solution...but not sure it is the best from a performance perspective:

$Source = "c:\scripts"
$Destination = "c:\test"
###################################################
###################################################
Param($Source,$Destination)
function Get-FileMD5 {
    Param([string]$file)
    $mode = [System.IO.FileMode]("open")
    $access = [System.IO.FileAccess]("Read")
    $md5 = New-Object System.Security.Cryptography.MD5CryptoServiceProvider
    $fs = New-Object System.IO.FileStream($file,$mode,$access)
    $Hash = $md5.ComputeHash($fs)
    $fs.Close()
    [string]$Hash = $Hash
    Return $Hash
}
function Copy-LatestFile{
    Param($File1,$File2,[switch]$whatif)
    $File1Date = get-Item $File1 | foreach-Object{$_.LastWriteTimeUTC}
    $File2Date = get-Item $File2 | foreach-Object{$_.LastWriteTimeUTC}
    if($File1Date -gt $File2Date)
    {
        Write-Host "$File1 is Newer... Copying..."
        if($whatif){Copy-Item -path $File1 -dest $File2 -force -whatif}
        else{Copy-Item -path $File1 -dest $File2 -force}
    }
    else
    {
        #Don't want to copy this in my case..but good to know
        #Write-Host "$File2 is Newer... Copying..."
        #if($whatif){Copy-Item -path $File2 -dest $File1 -force -whatif}
        #else{Copy-Item -path $File2 -dest $File1 -force}
    }
    Write-Host
}

# Getting Files/Folders from Source and Destination
$SrcEntries = Get-ChildItem $Source -Recurse
$DesEntries = Get-ChildItem $Destination -Recurse

# Parsing the folders and Files from Collections
$Srcfolders = $SrcEntries | Where-Object{$_.PSIsContainer}
$SrcFiles = $SrcEntries | Where-Object{!$_.PSIsContainer}
$Desfolders = $DesEntries | Where-Object{$_.PSIsContainer}
$DesFiles = $DesEntries | Where-Object{!$_.PSIsContainer}

# Checking for Folders that are in Source, but not in Destination
foreach($folder in $Srcfolders)
{
    $SrcFolderPath = $source -replace "\\","\\" -replace "\:","\:"
    $DesFolder = $folder.Fullname -replace $SrcFolderPath,$Destination
    if(!(test-path $DesFolder))
    {
        Write-Host "Folder $DesFolder Missing. Creating it!"
        new-Item $DesFolder -type Directory | out-Null
    }
}

# Checking for Folders that are in Destinatino, but not in Source
foreach($folder in $Desfolders)
{
    $DesFilePath = $Destination -replace "\\","\\" -replace "\:","\:"
    $SrcFolder = $folder.Fullname -replace $DesFilePath,$Source
    if(!(test-path $SrcFolder))
    {
        Write-Host "Folder $SrcFolder Missing. Creating it!"
        new-Item $SrcFolder -type Directory | out-Null
    }
}

# Checking for Files that are in the Source, but not in Destination
foreach($entry in $SrcFiles)
{
    $SrcFullname = $entry.fullname
    $SrcName = $entry.Name
    $SrcFilePath = $Source -replace "\\","\\" -replace "\:","\:"
    $DesFile = $SrcFullname -replace $SrcFilePath,$Destination
    if(test-Path $Desfile)
    {
        $SrcMD5 = Get-FileMD5 $SrcFullname
        $DesMD5 = Get-FileMD5 $DesFile
        If(Compare-Object $srcMD5 $desMD5)
        {
            Write-Host "The Files MD5's are Different... Checking Write
            Dates"
            Write-Host $SrcMD5
            Write-Host $DesMD5
            Copy-LatestFile $SrcFullname $DesFile
        }
    }
    else
    {
        Write-Host "$Desfile Missing... Copying from $SrcFullname"
        copy-Item -path $SrcFullName -dest $DesFile -force
    }
}

# Checking for Files that are in the Destinatino, but not in Source
foreach($entry in $DesFiles)
{
    $DesFullname = $entry.fullname
    $DesName = $entry.Name
    $DesFilePath = $Destination -replace "\\","\\" -replace "\:","\:"
    $SrcFile = $DesFullname -replace $DesFilePath,$Source
    if(!(test-Path $SrcFile))
    {
        Write-Host "$SrcFile Missing... Copying from $DesFullname"
        copy-Item -path $DesFullname -dest $SrcFile -force
    }
}
Tamica answered 24/3, 2009 at 16:4 Comment(1)
In 2019, we now have Get-FileHash.Gravitation
S
1

It is a bit long in the tooth but it does the job admirably - could be extended to look for archive bit ---a--- attribute of the file. Anyway might be a reasonable starting point for someone.

Function GetFileSHA ($file) {
    return [Array](Get-FileHash $file -Algorithm SHA256);
}
$SourceDir = "C:\temp\1";
$TargetDir = "C:\temp\2";
$SourceFiles = Get-ChildItem -Recurse $SourceDir;
$TargetFiles = Get-ChildItem -Recurse $TargetDir;

#Source Table
$dt = New-Object System.Data.DataTable;
$dt.TableName = "SrcFiles";
$dtcol1 = New-Object system.Data.DataColumn fileId,([System.Int32]); $dt.columns.add($dtcol1);
$dtcol1.AllowDBNull = $false;
$dtcol1.AutoIncrement = $true;
$dtcol1.AutoIncrementSeed = 0;
$dtcol1.Unique = $true;
$dt.PrimaryKey = $dtcol1;
$dtcol2 = New-Object system.Data.DataColumn fileName,([string]); $dt.columns.add($dtcol2);
$dtcol3 = New-Object system.Data.DataColumn filePath,([string]); $dt.columns.add($dtcol3);
$dtcol3.Unique = $true;
$dtcol4 = New-Object system.Data.DataColumn fileHash,([string]); $dt.columns.add($dtcol4);
$dtcol5 = New-Object system.Data.DataColumn fileDate,([System.DateTime]); $dt.columns.add($dtcol5);

#Target Table
$dt2 = New-Object System.Data.DataTable;
$dt2.TableName = "TrgFiles";
$dt2col1 = New-Object system.Data.DataColumn fileId,([System.Int32]); $dt2.columns.add($dt2col1);
$dt2col1.AllowDBNull = $false;
$dt2col1.AutoIncrement = $true;
$dt2col1.AutoIncrementSeed = 0;
$dt2col1.Unique = $true;
$dt2.PrimaryKey = $dt2col1;
$dt2col2 = New-Object system.Data.DataColumn fileName,([string]); $dt2.columns.add($dt2col2);
$dt2col3 = New-Object system.Data.DataColumn filePath,([string]); $dt2.columns.add($dt2col3);
$dt2col3.Unique = $true;
$dt2col4 = New-Object system.Data.DataColumn fileHash,([string]); $dt2.columns.add($dt2col4);
$dt2col5 = New-Object system.Data.DataColumn fileDate,([System.DateTime]); $dt2.columns.add($dt2col5);

#Store file hashes and other attributes into DataTable for comparison
ForEach ($src_file in $SourceFiles){
    $this_file = GetFileSHA $src_file.FullName;
    $row = $dt.NewRow();
    $row.fileName=($src_file).PSChildName;
    $row.filePath=($src_file).FullName;
    $row.fileHash=($this_file).Hash;
    $row.fileDate=($src_file).LastWriteTimeUtc;
    $dt.Rows.Add($row); 
}
ForEach ($trg_file in $TargetFiles){
    $this_file = GetFileSHA $trg_file.FullName;
    $row = $dt2.NewRow();
    $row.fileName=($trg_file).PSChildName;
    $row.filePath=($trg_file).FullName;
    $row.fileHash=($this_file).Hash;
    $row.fileDate=($trg_file).LastWriteTimeUtc;
    $dt2.Rows.Add($row);    
}

#Compare and copy if newer/changed
ForEach ($file in $dt){
    $search_dt2 = ($dt2 | Select-Object "fileName", "filePath", "fileDate", "fileHash" | Where-Object {$_.fileName -eq $file.fileName})
    if ($file.fileHash -eq $search_dt2.fileHash){
        $result=1;
        Write-Host "File Hashes are a match - checking LastWrite status just to be safe...";
    } else {
        $result=2;
        Write-Host "File Hashes are not a match - checking LastWrite status to see if Target is newer than source...";
    }
    if ($result -eq 1 -and ($file.fileDate -eq $search_dt2.fileDate)){
        $result=1;
        Write-Host "LastWrite status is a match. File will be skipped from copy.";
    } elseif ($result -eq 1 -and $search_dt2.fileDate -gt $file.fileDate) {
        $result=1;
        Write-Host "LastWrite status of the target is newer. File will be skipped from copy.";
    } elseif ($result -ne 1 -and $search_dt2.fileDate -gt $file.fileDate) {
        $result=1;
        Write-Host "LastWrite status of the target is newer than the source. File will be skipped from copy.";
    } else {
        $result=3;
        Write-Host "File in target is older than the source and is scheduled for copy...";
    }
    if (Test-Path $search_dt2.filePath){
    } else {
        $result=4;
        Write-Host "File does not exist in the target folder - file is scheduled for copy...";
    }
    
    #DO the action based on above logic
    if($result -ne 1){
        Copy-Item -Path $file.filePath -Destination $search_dt2.filePath -Force -Verbose
        Write-Host "Code:[$result]";        
    }   else {
        Write-Host "Code:[$result]";
    }
}
Security answered 23/6, 2023 at 9:17 Comment(0)
F
0

Try the below. It will check for anything within 7 days, which you can set or not. But, it will find any files that don't exist or anything updated within the past 7 days.

$sourcePath = "C:\MyTest"
$destinationPath = "C:\MyTest2"

function Get-FileCRC32 {
    param (
        [string]$filePath
    )
    
    $fileContent = Get-Content -Path $filePath -Raw
    $hashAlgorithm = [System.Security.Cryptography.CryptoConfig]::CreateFromName("CRC32")
    $hashProvider = [System.Security.Cryptography.HashAlgorithm]::Create($hashAlgorithm)

    $hashBytes = $hashProvider.ComputeHash([System.Text.Encoding]::UTF8.GetBytes($fileContent))
    $hashValue = [BitConverter]::ToString($hashBytes) -replace '-'

    return $hashValue
}

Get-ChildItem -Path $sourcePath -Recurse -File -Exclude *.log | 
    Where-Object {
        !($_.PSIsContainer) -and 
        (Test-Path -LiteralPath (Join-Path $destinationPath $_.FullName.Substring($sourcePath.length))) -eq $false -or
        (Get-FileCRC32 $_.FullName) -ne (Get-FileCRC32 (Join-Path $destinationPath $_.FullName.Substring($sourcePath.length)))
    } | 
    Copy-Item -Destination $destinationPath -Force -Verbose -Confirm:$false
Frager answered 26/1, 2024 at 23:32 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.