The most efficient way to delete millions of files based on modified date, in windows
Asked Answered
M

3

6

Goal: Use a script to run through 5 million - 10 million XML files and evaluate their date, if older than 90 days delete the file. The script would be run daily.

Problem: Using powershell Get-ChildItem -recurse, causes the script to lock up and fail to delete any files, I assume this is because of the way Get-ChildItem needs to build the whole array before taking any action on any file.

Solution ?: After lots of research I found that [System.IO.Directory]::EnumerateFiles will be able to take action on items in the array before the array is completely built so that should make things more efficient (https://msdn.microsoft.com/library/dd383458%28v=vs.100%29.aspx). After more testing I found that foreach ($1 in $2) is more efficient than $1 | % {} Before I run this new code and potentially crash this server again is there any adjustment anyone can suggest for a more efficient way to script this?

For testing I just created 15,000 x 0.02KB txt files in 15,000 directories with random data in them and ran the below code, I used 90 seconds instead of 90 days on the $date variable just for the test, it took 6 seconds to delete all the txt files.

$getfiles = [System.IO.Directory]::EnumerateFiles("C:\temp", "*.txt", "AllDirectories")
$date = ([System.DateTime]::Now).AddSeconds(-90)
foreach ($2 in $getfiles) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach
Mundy answered 13/2, 2016 at 23:41 Comment(2)
If you haven't done so already, it might help to disable 8.3 filename generation. Read all warnings first. Make a backup or two. Use at your own risk.Doradorado
As long as you simply save the output from enumeratefiles to a variable, you're not getting any benefits from the ienumerable as PS will wait for the line to finish before continuing (it's not an async method). You need to use it directly in a loop, pipeline or something similar.Icy
J
7

Powershell one-liner that does 100,000 files >= 90 days old.

[IO.Directory]::EnumerateFiles("C:\FOLDER_WITH_FILES_TO_DELETE") |
select -first 100000 | where { [IO.File]::GetLastWriteTime($_) -lt
(Get-Date).AddDays(-90) } | foreach { rm $_ }

or with progress shown:

[IO.Directory]::EnumerateFiles("C:\FOLDER_WITH_FILES_TO_DELETE") |
select -first 100000 | where { [IO.File]::GetLastWriteTime($_) -lt
(Get-Date).AddDays(-90) } | foreach { $c = 0 } { Write-Progress
-Activity "Delete Files" -CurrentOperation $_ -PercentComplete 
((++$c/100000)*100); rm $_ }

This works on folders that have a very large number of files. Thanks to my co-worker Doug!

Jubilant answered 6/6, 2017 at 1:20 Comment(1)
I like the idea of filtering the maximum number of files processed in the array but maybe not necessary seeing as [System.IO.Directory]::EnumerateFiles can process files in the array while the array is still being built. Also calling (Get-Date).AddDays(-90) for each file is not efficient. That should be a static variable.Mundy
A
4

You may be able to tweak it a little by filtering the $getfiles array completely before starting to delete files.

In PowerShell 3.0 and newer you can do this without using the pipeline (which indeed does add some overhead), by using the .Where({}) extension method:

$date  = (Get-Date).AddDays(-90)
$files = [System.IO.Directory]::EnumerateFiles("C:\temp", "*.txt", "AllDirectories").Where({[System.IO.File]::GetLastWriteTime($_) -le $date})
foreach($file in $files)
{
    [System.IO.File]::Delete($file)
}

Since you don't seem to care about it anyways, a final minuscule optimization may be had be waiwing error handling completely and just call the Windows API directly:

$Kernel32Util = Add-Type -MemberDefinition @'
[DllImport("kernel32", CharSet = CharSet.Unicode, SetLastError = true)]
[return: MarshalAs(UnmanagedType.Bool)]
public static extern bool DeleteFile(string filePath);
'@ -Name 'Kernel32Util' -Namespace 'NativeCode' -PassThru

And then do the same as above with your new external function wrapper instead of [File]::Delete():

foreach($file in $files)
{
    [void]$Kernel32Util::DeleteFile($file)
}

At this point though, I would probably take a step back and ask the question:

"Am I using the right tool for the job?"

My (personal) answer would be: "Probably not" - time to write a small utility in a compiled language (C#, F#, VB.NET) instead.

PowerShell is super powerful and useful, but at the cost of performance - that's not a bad thing - it's just something worth taking into account when deciding on what tool to use for a specific task :)

Anamariaanamnesis answered 14/2, 2016 at 0:53 Comment(1)
I did actually get a friend to write a windows forms application to do the same thing with aditional logging of each file deleted and it's way slower, not sure what code he used to build the array and delete the files but I did mention the need for efficiency.Mundy
M
3

I ended up with several slightly different codes for different versions of powershell

#If powershell version is >3
$date = ([System.DateTime]::Now).AddDays(-30)
foreach ($2 in ([System.IO.Directory]::EnumerateFiles("D:\Folder to cleanup", "*.*", "AllDirectories").Where({[System.IO.File]::GetLastWriteTime($_) -le $date}))) {
[System.IO.File]::Delete($2)
} #foreach

#IF powershell version is >2.0 <3.0
$date = ([System.DateTime]::Now).AddDays(-30)
foreach ($2 in ([System.IO.Directory]::EnumerateFiles("D:\Folder to cleanup", "*.*", "AllDirectories"))) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach

#IF powershell version is 2.0
$date = ([System.DateTime]::Now).AddDays(-30)
foreach ($2 in ([System.IO.Directory]::GetFiles("D:\Folder to cleanup", "*.*", "AllDirectories"))) {
if ([System.IO.File]::GetLastWriteTime($2) -le $date) {
[System.IO.File]::Delete($2)
} #if
} #foreach
Mundy answered 24/7, 2018 at 22:13 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.