I am having a difficult time understanding the most efficient to process large datasets/arrays in PowerShell. I have arrays that have several million items that I need to process and group. This list is always different in size meaning it could be 3.5 million items or 10 million items.
Example: 3.5 million items they group by "4's" like the following:
Items 0,1,2,3 Group together 4,5,6,7 Group Together and so on.
I have tried processing the array using a single thread by looping through the list and assigning to a pscustomobject which works it just takes 45-50+ minutes to complete.
I have also attempted to break up the array into smaller arrays but that causes the process to run even longer.
$i=0
$d_array = @()
$item_array # Large dataset
While ($i -lt $item_array.length){
$o = "Test"
$oo = "Test"
$n = $item_array[$i];$i++
$id = $item_array[$i];$i++
$ir = $item_array[$i];$i++
$cs = $item_array[$i];$i++
$items = [PSCustomObject]@{
'field1' = $o
'field2' = $oo
'field3' = $n
'field4' = $id
'field5' = $ir
'field6'= $cs
}
$d_array += $items
}
I would imagine if I applied a job scheduler that would allow me to run the multiple jobs would cut the process time down by a significant amount, but I wanted to get others takes on a quick effective way to tackle this.