Is there a way to stream data out faster from a large command?
Asked Answered
U

1

1

Let's say I'm using get-childitem c:\*.* -recurse and I am piping it. I have to wait for the whole get-childitem command to complete before the pipe handles it. There are exceptions such as select -first 2 which magically stops the previous command. Anyhow is there a way to improve output so it write right away instead of soaking up a ton of ram? One idea I have is...(which i know won't work, but it gets the idea across)

[System.IO.File]::ReadLines("$(dir c:\*.* -recurse)")

I know this is a windows thing because Linux will work with data as soon as it shows up. But two different worlds, I know.

My biggest concern is ram usage...

Here is a great example

(1..10000000) | where {$_ -like "*543*"}

this takes my machine about 100 Seconds

where

(1..10000000).where({$_ -like "*543*"})

only took 25 seconds.

Undeniable answered 15/8, 2018 at 2:35 Comment(0)
R
4

I have to wait for the whole get-childitem command to complete before the pipe handles it.

No: The very point of PowerShell's pipeline is to process objects one by one, as they become available, thereby acting as a memory throttle that keeps memory use constant irrespective of the size of the input collection.

  • Caveat: Do NOT place (...), the grouping operator around the command whose output you send through the pipeline, as that will indeed collect that command's output in full, in memory, first.

  • Cmdlets, as PowerShell's native commands, inherently support this one-by-one streaming.

    • However, some cmdlets such as Sort-Object and Group-Object must collect all input in memory first[1], as a conceptual necessity (e.g., you cannot produce sorted output until you've compared all items).Thanks, Bacon Bits.

    • Similarly, cmdlets such as ConvertTo-Json, which only emit a single output object, construct that one object from the entirety of the input collected up front.

  • Similarly, stdout output from external programs is passed through the line by line, as the lines becomes available.

  • You can turn an expression into a streaming command by enclosing it in & { ... }, but that is only useful if the expression hasn't already built the full collection of objects in memory; e.g.,
    & { 1..10000000 } | ... won't gain you anything, but
    & { for ($i=0; $i -lt 10000000; ++$i) { $i } } | ... would.[2]

  • Ultimately, if the source cmdlet / program / expression doesn't itself emit output objects in a streaming fashion (one by one, as they're being produced), you're out of luck.

However, what is indeed missing is the ability to stop pipeline processing on demand - which currently only Select-Object -First can do - see this answer of mine.
There's a longstanding feature request on GitHub that asks for a mechanism to stop a pipeline on demand.


As an aside: Using the PSv4+ .Where() method is indeed faster than using the Where-Object cmdlet (whose built-in alias is where), but .Where() invariably requires the collection that it operates on to have been loaded into memory in full beforehand.

However, the .Where() method does have the ability to stop processing remaining items by passing 'First' as the 2nd argument, which stops after the first match; 'First' is an instance of [System.Management.Automation.WhereOperatorSelectionMode]; compare the performance of
(1..1e6).Where({$_ -eq 10}) to that of
(1..1e6).Where({$_ -eq 10}, 'First')


[1] PowerShell does not use temporary files to ease the memory pressure the way the Unix sort utility does, for instance; my guess is that doing so is not really an option in PowerShell: PowerShell's ability to process live objects (rather than static strings) would present significant serialization / deserialization challenges were temporary file to be used.

[2] However, 1..10000000 | ... and & { foreach ($i in 1..10000000) { $i } } | ... would work: Uniquely among PowerShell's operators, .., the range operator is implemented as a lazy .NET enumerable, which direct pipeline input and use in a foreach conditional can take advantage of.

Richmal answered 15/8, 2018 at 3:16 Comment(3)
You have a very very impressive grasp of powershell. Do you recommend any books? I'd trust your input over reviews. I want to know the fundamental inner workings (like how parentheses force full load into ram) etc... And also a very wide grasp on .net framework/objects/commands. I'm aware of the mountain ahead. But I write many scripts for large files. And if I can shave 30% off time by better code, I could save months of time annuallyUndeniable
@RobertCotterman: There's the language spec, though it hasn't been updated since v3: microsoft.com/en-us/download/details.aspx?id=36389 The only book I know (and have only read excerpts of) is manning.com/books/windows-powershell-in-action-third-edition Beyond that, trial and error are your friend (the official docs, unfortunately, aren't, though things are improving since both PowerShell and the docs have gone open-source) and, if you're adventurous, studying the source code: github.com/PowerShell/PowerShellRichmal
for those wondering, there isn't really a right answer to my question, it's more of an exploratory method of discovering how powershell works, Mklement0 made some amazing points as he always does.Undeniable

© 2022 - 2025 — McMap. All rights reserved.