How to pass $_ ($PSItem) in a ScriptBlock
Asked Answered
H

5

7

I'm basically building my own parallel foreach pipeline function, using runspaces.

My problem is: I call my function like this:

somePipeline | MyNewForeachFunction { scriptBlockHere } | pipelineGoesOn...

How can I pass the $_ parameter correctly into the ScriptBlock? It works when the ScriptBlock contains as first line

param($_)

But as you might have noticed, the powershell built-in ForEach-Object and Where-Object do not need such a parameter declaration in every ScriptBlock that is passed to them.

Thanks for your answers in advance fjf2002

EDIT:

The goal is: I want comfort for the users of function MyNewForeachFunction - they shoudln't need to write a line param($_) in their script blocks.

Inside MyNewForeachFunction, The ScriptBlock is currently called via

$PSInstance = [powershell]::Create().AddScript($ScriptBlock).AddParameter('_', $_)
$PSInstance.BeginInvoke()

EDIT2:

The point is, how does for example the implementation of the built-in function ForEach-Object achieve that $_ need't be declared as a parameter in its ScriptBlock parameter, and can I use that functionality, too?

(If the answer is, ForEach-Object is a built-in function and uses some magic I can't use, then this would disqualify the language PowerShell as a whole in my opinion)

EDIT3:

Thanks to mklement0, I could finally build my general foreach loop. Here's the code:

function ForEachParallel {
    [CmdletBinding()]
    Param(
        [Parameter(Mandatory)] [ScriptBlock] $ScriptBlock,
        [Parameter(Mandatory=$false)] [int] $PoolSize = 20,
        [Parameter(ValueFromPipeline)] $PipelineObject
    )

    Begin {
        $RunspacePool = [runspacefactory]::CreateRunspacePool(1, $poolSize)
        $RunspacePool.Open()
        $Runspaces = @()
    }

    Process {
        $PSInstance = [powershell]::Create().
            AddCommand('Set-Variable').AddParameter('Name', '_').AddParameter('Value', $PipelineObject).
            AddCommand('Set-Variable').AddParameter('Name', 'ErrorActionPreference').AddParameter('Value', 'Stop').
            AddScript($ScriptBlock)

        $PSInstance.RunspacePool = $RunspacePool

        $Runspaces += New-Object PSObject -Property @{
            Instance = $PSInstance
            IAResult = $PSInstance.BeginInvoke()
            Argument = $PipelineObject
        }
    }

    End {
        while($True) {
            $completedRunspaces = @($Runspaces | where {$_.IAResult.IsCompleted})

            $completedRunspaces | foreach {
                Write-Output $_.Instance.EndInvoke($_.IAResult)
                $_.Instance.Dispose()
            }

            if($completedRunspaces.Count -eq $Runspaces.Count) {
                break
            }

            $Runspaces = @($Runspaces | where { $completedRunspaces -notcontains $_ })
            Start-Sleep -Milliseconds 250
        }

        $RunspacePool.Close()
        $RunspacePool.Dispose()
    }
}

Code partly from MathiasR.Jessen, Why PowerShell workflow is significantly slower than non-workflow script for XML file analysis

Hoke answered 24/10, 2018 at 17:50 Comment(4)
Either inspect the AST of the scriptblock and inject a param declaration if none exist, or extend PSCmdlet and invoke the scriptblock with the dollarUnderscore parameter setThorstein
The first argument passed to your scriptblock is in $args[0] or if it's taken as pipeline: $inputBabel
@MathiasR.Jessen: Could you be more specific? Do ForEach-Object / Where-Object etc. also do it like this?Hoke
@mklement0: Thanks, I've added Dispose calls, a sane ErrorActionPreference and I have removed the "barrier" - now completed results get passed down the pipeline before all runspaces have finished.Hoke
M
8

The key is to define $_ as a variable that your script block can see, via a call to Set-Variable.

Here's a simple example:

function MyNewForeachFunction {
  [CmdletBinding()]
  param(
    [Parameter(Mandatory)]
    [scriptblock] $ScriptBlock
    ,
    [Parameter(ValueFromPipeline)]
    $InputObject
  )

  process {
    $PSInstance = [powershell]::Create()

    # Add a call to define $_ based on the current pipeline input object
    $null = $PSInstance.
      AddCommand('Set-Variable').
        AddParameter('Name', '_').
        AddParameter('Value', $InputObject).
      AddScript($ScriptBlock)

    $PSInstance.Invoke()
  }

}

# Invoke with sample values.
1, (Get-Date) | MyNewForeachFunction { "[$_]" }

The above yields something like:

[1]
[10/26/2018 00:17:37]
Mahican answered 26/10, 2018 at 4:20 Comment(0)
C
4

What I think you're looking for (and what I was looking for) is to support a "delay-bind" script block, supported in PowerShell 5.1+. The documentation tells a bit about what's required, but doesn't currently provide any user-script examples.

Tough Technical Option: Manual Implementation

The gist of the documentation is that PowerShell will implicitly detect that your function can accept a delay-bind script block if it defines an explicitly typed pipeline parameter (either by Value or by PropertyName), as long as it's not of type [scriptblock] or type [object].

function Test-DelayedBinding {
     param(
         # this is our typed pipeline parameter
         # per doc this cannot be of type [scriptblock] or [object],
         # but testing shows that type [object] may be permitted
         [AllowEmptyString()]
         [Parameter(ValueFromPipeline)][string[]]$String,
         # this is our scriptblock parameter
         [Parameter(Position=0)][scriptblock]$Filter
     )

     Process {
         foreach($s in $String) {
             if (&$filter $s) {
                 Write-Output $s
             }
         }
     }
 }


# sample invocation
>'foo', 'fi', 'foofoo', 'fib' | Test-DelayedBinding { return $_ -match 'foo' }
foo
foofoo

Note that the delay-bind is subject to the following limitations:

  • delay-bind will only be applied if input is piped into the function
  • scoping and closure do not get applied in the same way as for built-in delay-bind cmdlets

The frustrating part is that there is no way to explicitly specify that delay-bind should be used, and errors resulting from incorrectly structuring your function may be non-obvious.

Easier Alternative: Using Built-In Cmdlets

PowerShell provides built-in cmdlets that implement delayed binding for both iteration/transformation (ForEach-Object) and filtering (Where-Object), which covers most situations where you'd want to use delay-bind.

Use these to easily build a custom delay-bind function without the above listed limitations:

function Test-WhereBasedFilter {
    param(
        [Parameter(ValueFromPipeline)]
        [object[]]
        $Object,

        [Parameter(Mandatory,Position=0)]
        [scriptblock]
        $Filter
    )

    process {
        foreach ($o in $object) {
            $o | Where-Object $Filter | Write-Output
        }
    }
}

# sample invocation
> 'foo', 'fi', 'foofoo', '', $null, 'fib' | Test-WhereBasedFilter { return $_ -match 'foo' }
foo
foofoo


function Test-ForBasedIterator {
    param(
        [Parameter(ValueFromPipeline)]
        [object[]]
        $Object,

        [Parameter(Mandatory,Position=0)]
        [scriptblock]
        $ScriptBlock
    )

    process {
        foreach ($o in $object) {
            $o | ForEach-Object $ScriptBlock | Write-Output
        }
    }
}

# sample invocation
> 'foo', 'fi', 'foofoo', '', $null, 'fib' | Test-ForBasedIterator { " 
 $_  foo!" }
  foo  foo!
  fi  foo!
  foofoo  foo!
    foo!
  fib  foo!

Building a custom delay-bind around built-ins is quicker and adds functionality:

  • input can be passed as a standard parameter as well as through pipeline
  • object inputs work fine
  • scoping and closures are handled as per the built-in you're using (which is more likely to be the user's expectation)
Cowell answered 4/8, 2021 at 4:58 Comment(2)
This should probably be the accepted answer now. This worked for meRotow
@Rotow - I've updated the answer to include an easier/better option I've discovered since original post, and to clarify further. Also converted pipeline param to be non-mandatory so it can accept null/empty.Cowell
C
2

You can use ScriptBlock.InvokeWithContext Method to pass the input object as $_ ($PSItem) to your powershell instances. It's also worth noting, seeing the last edit from your question, you should definitely add an Ast.GetScriptBlock() to the scriptblock argument to strip out its runspace affinity otherwise you will run into issues, either crashing your session or deadlocks. See GitHub issue #4003 for in depth details.

If you're looking for a more advanced version of your function, see this answer or a more advanced version in the GitHub repo which does not use a runspacepool.

function MyNewForeachFunction {
    [CmdletBinding()]
    Param(
        [Parameter(ValueFromPipeline)]
        [psobject] $PipelineObject,

        [Parameter(Mandatory, Position = 0)]
        [scriptblock] $ScriptBlock
    )

    process {
        try {
            # `.Ast.GetScriptBlock()` Needed to avoid runspace affinity issues!
            $ps = [powershell]::Create().AddScript({
                param([scriptblock] $sb, [psobject] $inp)

                $sb.InvokeWithContext($null, [psvariable]::new('_', $inp))
            }).AddParameters(@{
                sb  = $ScriptBlock.Ast.GetScriptBlock()
                inp = $PipelineObject
            })
            
            # using `.Invoke()` for demo purposes, would use `.BeginInvoke()`
            # instead for multi-threading
            $ps.Invoke()

            if ($ps.HadErrors) {
                foreach ($e in $ps.Streams.Error) {
                    $PSCmdlet.WriteError($e)
                }
            }
        }
        finally {
            if ($ps) {
                $ps.Dispose()
            }
        }
    }
}

0..10 | MyNewForeachFunction { $_ }
Corduroys answered 18/7, 2023 at 23:12 Comment(0)
C
1

Maybe this can help. I'd normally run auto-generated jobs in parallel this way:

Get-Job | Remove-Job

foreach ($param in @(3,4,5)) {

 Start-Job  -ScriptBlock {param($lag); sleep $lag; Write-Output "slept for $lag seconds" } -ArgumentList @($param)

}

Get-Job | Wait-Job | Receive-Job

If I understand you correctly, you are trying to get rid of param() inside the scriptblock. You may try to wrap that SB with another one. Below is the workaround for my sample:

Get-Job | Remove-Job

#scriptblock with no parameter
$job = { sleep $lag; Write-Output "slept for $lag seconds" }

foreach ($param in @(3,4,5)) {

 Start-Job  -ScriptBlock {param($param, $job)
  $lag = $param
  $script = [string]$job
  Invoke-Command -ScriptBlock ([Scriptblock]::Create($script))
 } -ArgumentList @($param, $job)

}

Get-Job | Wait-Job | Receive-Job
Clang answered 24/10, 2018 at 19:34 Comment(0)
C
1
# I was looking for an easy way to do this in a scripted function,
# and the below worked for me in PSVersion 5.1.17134.590

function Test-ScriptBlock {
    param(
        [string]$Value,
        [ScriptBlock]$FilterScript={$_}
    )
    $_ = $Value
    & $FilterScript
}
Test-ScriptBlock -Value 'unimportant/long/path/to/foo.bar' -FilterScript { [Regex]::Replace($_,'unimportant/','') }
Cheka answered 2/5, 2019 at 20:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.