Improve the efficiency of my PowerShell script
Asked Answered
J

3

4

The below code searches 400+ numbers from a list.txt file to see if it exists within any files within the folder path specified.

The script is very slow and has yet to complete as it did not complete after 25 minutes of running. The folder we are searching is 507 MB (532,369,408 bytes) and it contains 1,119 Files & 480 Folders. Any help to improve the speed of the search and the efficiency is greatly appreciated.

$searchWords = (gc 'C:\temp\list.txt') -split ','
$results = @()
Foreach ($sw in $searchWords)
{
    $files = gci -path 'C:\Users\david.craven\Dropbox\Asset Tagging\_SJC Warehouse_\_Project Completed_\2018\A*' -filter "*$sw*" -recurse

    foreach ($file in $files)
    {
        $object = New-Object System.Object
        $object | Add-Member -Type NoteProperty –Name SearchWord –Value $sw
        $object | Add-Member -Type NoteProperty –Name FoundFile –Value $file.FullName
        $results += $object
    }

}

$results | Export-Csv C:\temp\output.csv -NoTypeInformation
Jolinejoliotcurie answered 1/11, 2018 at 23:0 Comment(6)
Are you trying to look for $sw from file contents? The question sounds like you do, but the script looks only file names.Laughing
You read all 1,100 files in their entirety looking for each of 400 words! Can this crazy language maybe search for any of, say 10 words at a time? Then you'd only need 40 passes over 1,100 files and it would be 10 times faster. Do you have to keep searching a document if you find a number, or can you exit on first match? Does this crazy language allow parallelisation? Can you use Linux instead of this thing?Illdisposed
Take a look at Select-String, which can use regular expressions for more efficient matching. Also, you might be more efficient to get all the filenames first then check them in memory rather than multiple calls to Get-ChildItem. Finally, try using the PsCustomObject method rather than New-Object/Add-Member as the pipeline might be slowing things down.Edward
@MarkSetchell of course. select-string is the analog of grep in powershell, and it can search multiple patterns as well as regexFatal
If you have a working piece of code from your project and are looking for open-ended feedback in the areas: Best practices and design pattern usage, Security issues, *Performance*, Correctness in unanticipated cases - Then Code Review SE is the right place to ask questions. Can someone please move this question? I am not able to.Ashtray
+= kills puppies.Kuvasz
C
8

The following should speed up your task substantially:

If the intent is truly to look for the search words in the file names:

$searchWords = (Get-Content 'C:\temp\list.txt') -split ','
$path = 'C:\Users\david.craven\Dropbox\Facebook Asset Tagging\_SJC Warehouse_\_Project Completed_\2018\A*'

Get-ChildItem -File -Path $path -Recurse -PipelineVariable file |
  Select-Object -ExpandProperty Name |
    Select-String -SimpleMatch -Pattern $searchWords |
      Select-Object @{n='SearchWord'; e='Pattern'},
                    @{n='FoundFile'; e={$file.FullName}} |
        Export-Csv C:\temp\output.csv -NoTypeInformation

If the intent is to look for the search words in the files' contents:

$searchWords = (Get-Content 'C:\temp\list.txt') -split ','
$path = 'C:\Users\david.craven\Dropbox\Facebook Asset Tagging\_SJC Warehouse_\_Project Completed_\2018\A*'

Get-ChildItem -File -Path $path -Recurse |
  Select-String -List -SimpleMatch -Pattern $searchWords |
    Select-Object @{n='SearchWord'; e='Pattern'},
                  @{n='FoundFile'; e='Path'} |
      Export-Csv C:\temp\output.csv -NoTypeInformation

The keys to performance improvement:

  • Perform the search with a single command, by passing all search words to Select-String. Note: -List limits matching to 1 match (by any of the given patterns).

  • Instead of constructing custom objects in a script block with New-Object and Add-Member, let Select-Object construct the objects for you directly in the pipeline, using calculated properties.

  • Instead of building an intermediate array iteratively with += - which behind the scenes recreates the array every time - use a single pipeline to pipe the result objects directly to Export-Csv.

Clie answered 2/11, 2018 at 2:13 Comment(8)
Nice! I always forget about -PipelineVariable!Pathway
Thanks, @MattMcNabb. It's a handy feature, but the need for it doesn't arise too often, so it's hard to remember.Clie
Thanks @MattMcNabb for that great explanation. I am seeing the below error unfortunately. Select-String : Cannot bind argument to parameter 'Pattern' because it is an empty string. At C:\Users\david.craven\Downloads\test.ps1:5 char:39 + Select-String -SimpleMatch -Pattern $searchWords | + ~~~~~~~~~~~~ + CategoryInfo : InvalidData: (:) [Select-String], ParameterBindin gValidationException + FullyQualifiedErrorId : ParameterArgumentValidationErrorEmptyStringNotAll owed,Microsoft.PowerShell.Commands.SelectStringCommand Jolinejoliotcurie
@dcraven: That suggests that $searchWords is empty rather than containing your search words.Clie
@Clie - I am not sure how as it has over 500 different values i.e FOC2223NHZB, FOC2223NHZ4, FOC2214N235, FOC2223NJ01,Jolinejoliotcurie
@dcraven: Maybe in a different variable name - typo? You can recreate the problem with 'input' | Select-String $NoSuchVariable vs. 'input' | Select-String 'in', 'put'Clie
As the 2nd script would output multiple occurences in a file without distinguishing between them, I'd add another calculated property to the Select-Object @{n='Line';e={"{0,5}:{1}" -f $_.LineNumber,$_.Line}} or otherwise add the -Uniqueswitchparameter (+1)Delwyn
Good point about multiple matches, @LotPings. For simplicity I decided to add -List to Select-String, which limits matching to at most 1 occurrence.Clie
P
1

So there are definitely some basic things in the PowerShell code you posted that can be improved, but it may still not be super fast. Based on the sample you gave us I'll assume you're looking to match the file names against a list of words. You're looping through the list of words (400 iterations) and in each loop you're looping through all 1,119 files. That's a total of 447,600 iterations!

Assuming you can't reduce the number of iterations in the loop, let's start by making each iteration faster. The Add-Member cmdlet is going to be really slow, so switch that approach up by casting a hashtable to the [PSCustomObject] type accelerator:

[PSCustomObject]@{
    SearchWord = $Word
    File       = $File.FullName
}

Also, there is no reason to pre-create an array object and then add each file to it. You can simply capture the ouptut of the foreach loop in a variable:

$Results = Foreach ($Word in $Words)
{
...

So a faster loop might look like this:

$Words = Get-Content -Path $WordList
$Files = Get-ChildItem -Path $Path -Recurse -File

$Results = Foreach ($Word in $Words)
{    
    foreach ($File in $Files)
    {
        if ($File.BaseName -match $Word)
        {
            [PSCustomObject]@{
                SearchWord = $Word
                File       = $File.FullName
            }
        }
    }
}

A simpler approach might be to use Where-Object on the files array:

$Results = Foreach ($Word in $Words)
{
    $Files | Where-Object BaseName -match $Word
}

Try both and test out the performance.

Pathway answered 2/11, 2018 at 2:33 Comment(0)
P
0

So if speeding up the loop doesn't meet your needs, try removing the loop entirely. You could use regex and join all the words together:

$Words = Get-Content -Path $WordList
$Files = Get-ChildItem -Path $Path -Recurse -File
$WordRegex = $Words -join '|'
$Files | Where basename -match $WordRegex
Pathway answered 2/11, 2018 at 2:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.