How is the new PowerShell 7 ForEach-Object Parallel implemented?
Asked Answered
H

2

8

PowerShell 7 introduced a much needed feature for running pipeline input in parallel.

The documentation for PowerShell 7 does not provide any detail on how this is implemented.

Having leveraged PoshRSJob and Invoke-Parallel modules before, I'm aware that runspaces were traditionally considered the much more efficient approach for parallel operations in powershell over running PowerShell jobs. I've read some mixed content indicating that this is using threading now and not runspaces, but can't find anything else specific.

I'd really appreciate some technical insight into:

  1. What is the lifecycle of an execution from a .NET perspective
  2. Is the new functionality runspaces or threads? (or is a runspace just a .NET thread in System.Management.Automation?)
  3. Does this bring about any complexity in traditional debugging now that we are moving into parallel operations? Historically I had a rough time debugging with runspaces, and not sure what options might have been improved
Hydroplane answered 24/3, 2020 at 19:13 Comment(5)
according to the articles i have seen, it uses runspaces. you need to send $vars into it [usually with $Using:]. it loads all the needed modules and functions and whatnot into each runspace, so it takes time to set up & tear down. i have not seen anything about debugging so far.Burschenschaft
Separate runspaces, managed via a new internal API (PSTaskPool). The RFC goes into some detail about implementation and constraints. The source code also contains a number of helpful commentsGrobe
Just in case (who knows) there is also SplitPipeline with some unique features (IMHO, indeed), e.g. it works well with very large or infinite input.Geographical
There's also start-threadjob, which I believe was written by the same guy. It doesn't serialize objects like start-job.Jarnagin
RFC was pointed out to me with some great info too. Will review and post an answer here if it answers some of this later. RFC0044-ForEach-Parallel-CmdletHydroplane
H
3

Found this fantastic blog post PowerShell ForEach-Object Parallel Feature by Paul Higinbotham.

From this blog post the key highlights I took away:

Script blocks run in a context called a PowerShell runspace. The runspace context contains all of the defined variables, functions and loaded modules.

As previously mentioned, the new ForEach-Object -Parallel feature uses existing PowerShell functionality to run script blocks concurrently....PowerShell itself imposes conditions on how scripts run concurrently, based on its design and history. Scripts have to run in runspace contexts and only one script thread can run at a time within a runspace. So in order to run multiple scripts simultaneously multiple runspaces must be created.

So it confirms runspaces are the main driver for this and provides some further information on threadsafe operations and more. Any prior answers or detail provided on runspaces are relevant here as this is a matured implementation of runspaces for parallel operations in the official standard library. Other implementations have been done by the community that are runspace oriented, but this is now included with no external module dependencies.

Thanks Paul for such a good contribution to the community! 👍

Hydroplane answered 8/5, 2020 at 19:27 Comment(2)
I don't think you can. Runspaces are threaded so not sure you can break into seperate runspaces with debugger. I've never had success trying that so let me know if you find out otherwise. Sounds like maybe a good seperate questionHydroplane
there is a debug-runspace command. But you might have to break in from another process unless the loop is somehow running in the background.Jarnagin
J
6

Debugging foreach-object -parallel:

I need a second pwsh process to do it. In the first one do:

foreach-object -parallel { Wait-Debugger;1;2;3 }

Then in the second window, figure out what the pid of the other pwsh is. Then enter that pshostprocess. Look at the runspaces, and debug the one whose availability says "InBreakpoint". "v" means "step over".

get-process pwsh

 NPM(K)    PM(M)      WS(M)     CPU(s)      Id  SI ProcessName
 ------    -----      -----     ------      --  -- -----------
     64    44.32      82.23       1.70    3912  12 pwsh
     63    40.66      78.03       1.36    6472  12 pwsh

$pid
6472

Enter-PSHostProcess 3912

get-runspace

 Id Name            ComputerName    Type          State         Availability
 -- ----            ------------    ----          -----         ------------
  1 Runspace1       localhost       Local         Opened        Busy
  2 PSTask:1        localhost       Local         Opened        InBreakpoint
  3 RemoteHost      localhost       Local         Opened        Busy

debug-runspace 2
v
v
v

If you run foreach-object -parallel -asjob, you can use get-runspace and debug-runspace in the same window. But you couldn't see the output when stepping.

foreach-object -parallel { Wait-Debugger;1;2;3 } -asjob
get-runspace

 Id Name            ComputerName    Type          State         Availability
 -- ----            ------------    ----          -----         ------------
  1 Runspace1       localhost       Local         Opened        Available
  2 PSTask:1        localhost       Local         Opened        InBreakpoint

debug-runspace 2
v
v
v

Here's a new debugging video that has some advanced setups with Vscode: https://www.reddit.com/r/PowerShell/comments/gn0270/advanced_powershell_debugging_techniques/

Jarnagin answered 20/5, 2020 at 2:32 Comment(0)
H
3

Found this fantastic blog post PowerShell ForEach-Object Parallel Feature by Paul Higinbotham.

From this blog post the key highlights I took away:

Script blocks run in a context called a PowerShell runspace. The runspace context contains all of the defined variables, functions and loaded modules.

As previously mentioned, the new ForEach-Object -Parallel feature uses existing PowerShell functionality to run script blocks concurrently....PowerShell itself imposes conditions on how scripts run concurrently, based on its design and history. Scripts have to run in runspace contexts and only one script thread can run at a time within a runspace. So in order to run multiple scripts simultaneously multiple runspaces must be created.

So it confirms runspaces are the main driver for this and provides some further information on threadsafe operations and more. Any prior answers or detail provided on runspaces are relevant here as this is a matured implementation of runspaces for parallel operations in the official standard library. Other implementations have been done by the community that are runspace oriented, but this is now included with no external module dependencies.

Thanks Paul for such a good contribution to the community! 👍

Hydroplane answered 8/5, 2020 at 19:27 Comment(2)
I don't think you can. Runspaces are threaded so not sure you can break into seperate runspaces with debugger. I've never had success trying that so let me know if you find out otherwise. Sounds like maybe a good seperate questionHydroplane
there is a debug-runspace command. But you might have to break in from another process unless the loop is somehow running in the background.Jarnagin

© 2022 - 2024 — McMap. All rights reserved.