Using continue/return statement inside .ForEach() method - is it better to use foreach ($item in $collection) instead?
Asked Answered
V

3

5

It's fairly well documented that foreach processing speed varies depending on the way a foreach loop is carried out (ordered from fastest to slowest):

  1. .ForEach() method
  2. foreach ($item in $collection) {}
  3. $collection | ForEach-Object {}


  • When working with (very) large collections, speed comparisons between #1 and #2 can be significant, and the overhead of piping makes #3 a non-candidate in those cases.
  • #2 offers the continue statement, while #1 does not
    • Please correct/comment if this is inaccurate
  • From what I've seen online and in real life, return is how to "continue" when using the .ForEach() method.



My questions:

  1. When the speed advantage of the .ForEach() method is too big to settle for foreach and you need to continue, what is the proper way to continue when using .ForEach({})?
  2. What are the implications or gotcha's you should be aware of when using return inside the .ForEach() method?
Votive answered 15/12, 2023 at 4:50 Comment(4)
not sure where are you getting your information from but .ForEach is not faster than foreach, in fact .ForEach is the worst of those three methods because it doesn't stream output.Marybethmaryellen
powershellmagazine.com/2014/10/22/… ss64.com/ps/foreach-method.html adamtheautomator.com/powershell-foreach/#The_foreach_Method Those are a few sources. I'm not sure that "worst of those three methods" would be accurate since producing output isn't always necessarily the goal of a foreach loop.Aesthetically
" since producing output isn't always necessarily the goal of a foreach loop", well .ForEach always produces output, no matter what as shown in my answer ;)Marybethmaryellen
This is a good question because it shows research, regardless of whether what it presents is correct or not. Actual test results would have substantiated its claims.Should
M
6

...and the overhead of piping makes #3 a non-candidate in those cases.

Incorrect, the pipeline is very efficient, it's almost pair with foreach (fastest way in PowerShell to enumerate a collection). ForEach-Object is the inefficient one because it dot sources the scriptblock.

.ForEach is almost never a good a alternative, tests below clearly show that. In addition the output type is always a Collection<T>:

''.ForEach({ }).GetType()

#   Namespace: System.Collections.ObjectModel
#
# Access        Modifiers           Name
# ------        ---------           ----
# public        class               Collection<PSObject>...

.ForEach doesn't stream output, meaning that there is no way to exit early from the loop with Select-Object -First, this also means higher memory consumption.

Measure-Command {
    (0..10).ForEach({ $_; Start-Sleep -Milliseconds 200 }) | Select-Object -First 1
} | ForEach-Object TotalSeconds

# 2.2637483

As for the 2nd question, return is the closest you can get to continue exiting early from the invocation, there are no gotchas there as long it is understood that it exits early from the current invocation and goes to the next item in the collection, however there is no real way to break the loop using .ForEach.

I believe it's already understood but, break and continue should not be used outside of outside of a loop, switch, or trap:

& {
    ''.ForEach({ break })
    'will never get here'
}

'or here'

If you're looking for performance you should rely on foreach or a scriptblock with a process block or function with a process block.

$range = [System.Linq.Enumerable]::Range(0, 1mb)
$tests = @{
    'foreach' = {
        foreach ($i in $args[0]) { $i }
    }
    '.ForEach()' = {
        $args[0].ForEach({ $_ })
    }
    'ForEach-Object' = {
        $args[0] | ForEach-Object { $_ }
    }
    'process block' = {
        $args[0] | & { process { $_ } }
    }
}

$tests.GetEnumerator() | ForEach-Object {
    [pscustomobject]@{
        Test = $_.Key
        Time = (Measure-Command { & $_.Value $range }).TotalMilliseconds
    }
} | Sort-Object Time

# Test              Time
# ----              ----
# foreach         103.96
# process block   918.04
# .ForEach()     3614.44
# ForEach-Object 9046.14
Marybethmaryellen answered 15/12, 2023 at 5:15 Comment(2)
Depending on what the inner code is actually doing, … | foreach-object { … } -Parallel could offer better / worse performance characteristics as well, especially for high latency network operations that might dwarf the time spent on internal book-keeping overheads for the different approaches…Accept
@Accept I agree but I think doesn't apply to this question I think since it's about sequential loop. Or at least parallel was never mentioned by OPMarybethmaryellen
L
4

As stated in the helpful answer from Santiago Squarzon the pipeline is indeed very efficient and often underestimated as one doesn't incorporate where the data is actually coming from. For Santiago's performance testing it is conveniently presumed that the data is already in memory were he "preloads" the sample data (using a deferred execution statement): [System.Linq.Enumerable]::Range(0, 1mb)
But that isn't required in the same way for a proper PowerShell pipeline.

Let's clear the air with an exaggerated example (which also covers your second question) where the .foreach{} method is clearly defeated by the ForEach-Object cmdlet:

  • This takes almost 1 second:
(0..1mb).ForEach{ if ($_ -eq 1) { return $_ } }
  • Where this takes about 43 milliseconds:
0..1mb | ForEach-Object { if ($_ -eq 1) { $_ } } | Select-Object -First 1 

(note: To properly test the performance, you should start a new terminal session)

You might argue that this isn't a fair competition, because -aside from the fact that I am selecting the first occurrence¹- I load the whole array into memory (using the grouping operator ( )) for the method test.
But that is in fact exactly where this is about; it is comparing apples to oranges knowing that there is no easy way for the .foreach{} method to do otherwise, where cmdlets are optimized for the pipeline which includes dealing with the input (and output). Needless to mention that any meaningful function should at least have some input or output...

Notes

  1. If your targeted search is further in the array/stream (than just $_ -eq 1 in the examples), the performance difference will of cause get less. But than again, if your data doesn't come from memory (which is usually initially the case) but from a slower source as a file or a (remote) database, than you will see that the full pipeline solution performs as good (or better) even if you iterate through the whole range.

Bottom line:

For performance testing the pipeline you will need to compare your whole solution.

I challenge anybody who blindly states that a .foreach{} method or foreach ( ... ) statement is generally faster than the Foreach-Object cmdlet, to write (and performance test) a substitute for a common pipeline as:

Get-Content .\MyList.txt |
    ForEach-Object {
        if ($_ -eq 'something') { $_ }
    } | Select-Object -First 1

Or excluding your second question:

Import-Csv .\Input.csv |
    ForEach-Object {
        if ($_.value -eq 'something') { $_.value = 'something else' }
    } | Export-Csv .\Output.csv

Because even it is often suggested to be faster:

  • Any .foreach{} method or foreach ( ... ) statement substitute, will likely perform less for these examples (or the difference is marginal)

  • Any .foreach{} method or foreach ( ... ) statement substitute will need more code which is likely more difficult to read (develop and maintain) taking the note from the PowerShell scripting performance considerations document in consideration:

Note
Many of the techniques described here aren't idiomatic PowerShell and may reduce the readability of a PowerShell script. Script authors are advised to use idiomatic PowerShell unless performance dictates otherwise.

  • Any .foreach{} method or foreach ( ... ) statement substitute will likely consume more memory which could possibly lead to performance issues by itself if you reach the limits of computer's physicals memory

  • And (as mentioned by mclayton) as extra bonus, once you have setup a proper pipeline, it is a minor step (for PowerShell 7, just a single -Parallel parameter) to possibly further boost your pipeline with parallel processing

Ligate answered 15/12, 2023 at 15:24 Comment(1)
I could read one of those articles until I saw "This method provides faster performance than its older counterparts (the foreach statement and the ForEach-Object cmdlet)...", it is unfortunate how people get misinformed with articles and trust them without testing themselves.Marybethmaryellen
D
2

To complement the existing, helpful answers by addressing your specific questions:

is it better to use foreach ($item in $collection) instead? [from the title]

Yes: foreach by far performs best among the PowerShell enumeration techniques and is the only one that directly supports stopping the enumeration on demand, with break.
See below for a detailed discussion of performance as well as syntax considerations.

...what is the proper way to continue when using .ForEach({})?

The proper way is to use return:

  • In script blocks ({ ... }), such as ones passed to the .ForEach() method or to the ForEach-Object cmdlet, return exits that block only.

  • In the context of enumerating commands such as .ForEach() / ForEach-Cmdlet, this means that processing continues with the next input object, i.e. it resumes the ongoing enumeration.

This makes it analogous to the continue keyword, which only applies to a select few language statements, such as foreach - see this answer for details.

  • Note that even though language statements use { ... } blocks too, they aren't stand-alone script-block objects - the { and } purely act as delimiters for the enclosed code, so that a return in such a block exits the enclosing function or script file (or stand-alone script block) instead; a quick illustration of the difference:

    # Prints only 1 - the `return` exists the enclosing script block, 
    # not (just) the `foreach` statement.
    & { foreach ($i in 1..2) { $i; return }; 'never get here' }
    

However, I suspect you are instead looking for an analog to the break keyword, which stops processing further input on demand, i.e. stops the ongoing enumeration.

  • As with continue, break only meaningfully works with a select few language statements - used outside such a context, PowerShell looks up the call stack for an enclosing loop / switch statement anywhere and breaks out of that; in the absence of one, the entire script (runspace) is exited.

  • As of PowerShell (Core) 7.4.0, there is NO break analog for .ForEach() or ForEach-Object / the pipeline in general.

    • There is NO workaround for .ForEach():

      • The only way to collect output from such a call is to let it run to completion, at which point it returns a collection of output object.

      • Hypothetically, in order for .ForEach() to support on-demand stoppage of the enumeration in the future, the script block would have to have a way to signal to the method that further enumeration should be stopped; this can not be done via output from the script block - because that is used for results.

      • Therefore, using a foreach statement is your only well-performing alternative - in fact, as has been noted, it is faster than .ForEach()

    • Suboptimal workarounds for ForEach-Object / the pipeline in general:

      • Use throw in combination with try / catch

      • Alternatively, given the behavior of continue and break discussed above, you can use a dummy loop that you can break out of on demand.

      • If you combine these workarounds with the workaround for ForEach-Object's inefficient implementation[1] mentioned in Santiago's answer, you get decent performance even in the pipeline that surpasses even .ForEach() (but is still about an order of magnitude slower than a foreach statement)

      • Two simple examples - both only process and collect the first input object:

        # try / catch + throw
        $result = 
          try {  
            # Use `throw` to abort the pipeline.
            # Any objects emitted before that are still collected.
            1..100 | . { process { $_; throw } }
          } catch {}
        
        
        # Dummy loop + break
        $result = 
          do {
            # Use break to break out of the dummy loop.
            # Any objects emitted before that are still collected.
            1..100 | . { process { $_; break } }
          } while ($false)
        

Future considerations:

This asymmetry between loop-like statements and the pipeline is unfortunate:

  • Stopping an enumeration on demand can be an important performance optimization.

  • Outside of loop-like statements, this is currently only supported in two very specific scenarios:

    • As iRon points out, .ForEach()'s companion function, the intrinsic .Where() method, has an optional second parameter to which 'First' may be passed to stop enumeration once the first match has been found.

    • In the pipeline, Select-Object's -First parameter is capable of stopping the enumeration after a given number of objects have been received.[2]

There's a long-standing feature request that asks for user code to be able to stop upstream cmdlets on demand:

  • GitHub issue #3821

  • No decision as to how to implement this has been made as of this writing, but the two basic choices are:

    • Provide a new cmdlet.
    • Provide a new keyword (note that applying break and continue to script blocks in pipelines too is not an option, as it would break backward compatibility).
  • Neither approach is a good fit for also bringing support for on-demand enumeration stoppage to the .ForEach() method, although a keyword-based approach, if named abstractly, would work better conceptually, e.g. breakenum or, to avoid confusion with break, stopenum


Performance considerations:

Given Santiago's pipeline-performance optimization based on a process block, the performance ranking is actually as follows, based on an input collection that is already in memory, in full:

  1. The foreach statement - the fasted by far.

  2. The pipeline with the process block workaround, to compensate for the inefficient implementation of ForEach-Object, roughly one order of magnitude slower.

  3. The .ForEach() method, roughly two orders of magnitude slower.

  4. The ForEach-Object cmdlet, roughly two orders of magnitude slower, and around 50% slower than .ForEach().

Note:

  • The rough relative performance qualifiers are based on experiments with processing 1 million input objects and capturing the output in a variable, in PowerShell (Core) v7.4.0 (built on .NET 8), on both macOS and Windows. (In Windows PowerShell, foreach is noticeably slower than in PowerShell (Core), though still about 5 times faster than the process block solution).

  • There is one case in which .ForEach() is slower than ForEach-Object, but it is atypical: if you output results directly to the host (console).


Memory-usage / output-timing considerations - collect-in-full vs. streaming behavior:

  • ForEach-Object:

    • As a cmdlet, it integrates naturally with PowerShell's pipeline and its streaming behavior: ForEach-Object receives input objects via the pipeline one by one and emits its output objects one by one, with a downstream cmdlet receiving these objects as soon as they're being emitted - see this answer for background information.
  • .ForEach() (and .Where()):

    • On input:

      • They invariably collect their input in full, up front, even when operating on a lazy .NET enumerable.
    • On output:

      • They invariably build up a collection of results method-internally first, and only then product output, namely that collection[3] as a whole; see also the next section.
  • The foreach statement:

    • On input:

      • It too collects its input in full, up front, except when operating on an expression returning a lazy .NET enumerable, which notably includes - uniquely among PowerShell's operators - .., the range operator; e.g. (note that 1e7 is short for 10000000, i.e. 10 million):

        # Outputs 1 and finishes almost instantly, because the input
        # is a lazy .NET enumerable that is enumerated as such.
        foreach ($i in [System.Linq.Enumerable]::Range(1, 1e7)) { $i; break }
        
        # Ditto for `..`, the range operator.
        # NOTE: It is the ONLY PowerShell operator that does that.
        foreach ($i in 1..1e7) { $i; break }
        
        # Output from PowerShell *commands* as well as from other operators
        # *is* collected up front:
        
        # Takes a long time, because the Write-Output output is collected first.
        foreach ($i in Write-Output (1..1e7)) { $i; break }
        
        # Ditto, for the results of the -replace operation.
        foreach ($i in 1..1e7 -replace '^', '+') { $i; break }
        
    • On output:

      • It streams its output, as a cmdlet does, but you can only take advantage of that if you invoke it enclosed in a script block, via &, the call operator or ., the dot-sourcing operator; e.g.:

        # Outputs 1 and finishes almost instantly, because the `foreach`
        # output is streamed (and 1..1e6 is enumerated lazily).
        & { foreach ($i in 1..1e7) { $i } } | Select-Object -First 1
        
      • By contrast, enclosure in $(...), the subexpression operator, or @(...), the array-subexpression operator - with any enclosed statement(s) - invariably results in up-front collection of the output, in full; e.g.:

        # Outputs 1 only after a long time, because $(...) caused
        # up-front collection.
        $(foreach ($i in 1..1e7) { $i }) | Select-Object -First 1
        

Reasons to use or avoid .ForEach() / .Where():

Given the faster alternatives, are there still benefits to .ForEach() and .Where()?

  • Downsides:

    • Both .ForEach() and .Where() always emit a collection,[3] even if there's only one output object.

      • Given PowerShell's automatic enumeration behavior in the pipeline and its member-access enumeration feature, that will often not matter in practice, though it is more likely to be a problem with .Where() in combination with 'First'; e.g.:

        # Breaks, because the collection returned cannot bind
        # to the [int] parameter; requires [0]
        & { 
          param([int] $Number)
        }  (1, 5, 10).Where({ $_ -ge 5 }, 'First')
        
    • Due to the specific type of the collection,[3] its elements are wrapped in typically invisible [psobject] instances, which is not only unnecessary, but can have side effects.

      • Again, it will often not matter in practice and never should, but these meant-to-be-invisible wrappers do situationally result in different behavior - see GitHub issue #5579.

      • Method syntax - i.e., the need to use (...) around the argument list and to separate arguments with , - is not a natural fit for PowerShell (which uses shell-like invocation syntax), though for users with a programming background that is less likely to be an issue.

  • Upsides:

    • As method calls, they can directly act as or take part in expressions. This allows you to do things such as:

      # Direct use as a command argument.
      Write-Output (1..3).ForEach({ $_ + 1 })
      
      # Direct use as pipeline input.
      (1..3).ForEach({ $_ + 1 }) | Write-Output
      
      • Note:

        • Language statements such as foreach can only act as as expressions in an assignment, and only stand-alone, e.g., $results = foreach ($i in 1..3) { $i + 1 } works, but direct use of foreach in the examples above would not.

        • To make language statements work as expressions in general, wrap them in $(...) or @(...) for up-front collection of their outputs, or in & { ... } or . { ... } to stream their output (something that .ForEach() and .Where() cannot do).

    • Additional features: Both .ForEach() and .Where() have features that their cmdlet counterparts, ForEach-Object and Where-Object do not support.

      • .Where() notably supports stopping after the first or only returning the last match, as well as splitting the input collection in two; e.g.:

         # Returns right away, because 'First' stops enumeration after the
         # first match. 'Last' is available too.
         (1..1e6).Where({ $_ -eq 2 }, 'First')
        
         # Returns *two* collections
         $odd, $even = (1..10).Where({ $_ % 2 -eq 1 }, 'Split')
        
        • Bringing these powerful features to the Where-Object cmdlet too is the subject of GitHub issue #13834.
      • .ForEach()'s additional features are less compelling, as expression-mode alternatives exists, but one feature is useful when member-access enumeration isn't available: the availability to efficiently collect property values from all elements of a collection; e.g.:

         # Returns a collection of all file lengths (sizes).
         # Note that (Get-ChildItem -File).Length - i.e. member-access enumeration -
         # is NOT an option here, because the array used to collect the
         # command output *itself* has a .Length property.
         (Get-ChildItem -File).ForEach('Length')
        

[1] See GitHub issue #10982.

[2] It does this via a non-public exception type, which is why user code cannot take advantage of it.

[3] Of type System.Collections.ObjectModel.Collection`1, with [psobject]-typed elements.

Dorcas answered 15/12, 2023 at 21:42 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.