I have to wait for the whole get-childitem command to complete before the pipe handles it.
No: The very point of PowerShell's pipeline is to process objects one by one, as they become available, thereby acting as a memory throttle that keeps memory use constant irrespective of the size of the input collection.
Caveat: Do NOT place (...)
, the grouping operator around the command whose output you send through the pipeline, as that will indeed collect that command's output in full, in memory, first.
Cmdlets, as PowerShell's native commands, inherently support this one-by-one streaming.
However, some cmdlets such as Sort-Object
and Group-Object
must collect all input in memory first[1], as a conceptual necessity (e.g., you cannot produce sorted output until you've compared all items).Thanks, Bacon Bits.
Similarly, cmdlets such as ConvertTo-Json
, which only emit a single output object, construct that one object from the entirety of the input collected up front.
Similarly, stdout output from external programs is passed through the line by line, as the lines becomes available.
You can turn an expression into a streaming command by enclosing it in & { ... }
, but that is only useful if the expression hasn't already built the full collection of objects in memory; e.g.,
& { 1..10000000 } | ...
won't gain you anything, but
& { for ($i=0; $i -lt 10000000; ++$i) { $i } } | ...
would.[2]
Ultimately, if the source cmdlet / program / expression doesn't itself emit output objects in a streaming fashion (one by one, as they're being produced), you're out of luck.
However, what is indeed missing is the ability to stop pipeline processing on demand - which currently only Select-Object -First
can do - see this answer of mine.
There's a longstanding feature request on GitHub that asks for a mechanism to stop a pipeline on demand.
As an aside: Using the PSv4+ .Where()
method is indeed faster than using the Where-Object
cmdlet (whose built-in alias is where
), but .Where()
invariably requires the collection that it operates on to have been loaded into memory in full beforehand.
However, the .Where()
method does have the ability to stop processing remaining items by passing 'First'
as the 2nd argument, which stops after the first match; 'First'
is an instance of [System.Management.Automation.WhereOperatorSelectionMode]
; compare the performance of
(1..1e6).Where({$_ -eq 10})
to that of
(1..1e6).Where({$_ -eq 10}, 'First')
[1] PowerShell does not use temporary files to ease the memory pressure the way the Unix sort
utility does, for instance; my guess is that doing so is not really an option in PowerShell: PowerShell's ability to process live objects (rather than static strings) would present significant serialization / deserialization challenges were temporary file to be used.
[2] However, 1..10000000 | ...
and & { foreach ($i in 1..10000000) { $i } } | ...
would work: Uniquely among PowerShell's operators, ..
, the range operator is implemented as a lazy .NET enumerable, which direct pipeline input and use in a foreach
conditional can take advantage of.