It is easier to collect the results in an array instead of a List<Foo>
. Assuming that the List<byte[]>
is named source
, you can do this:
Foo[] output = new Foo[source.Count];
ParallelOptions options = new() { MaxDegreeOfParallelism = Environment.ProcessorCount };
Parallel.ForEach(source, options, (byteArray, state, index) =>
{
output[index] = Deserialize(byteArray);
});
Notice the absence of any kind of synchronization (lock
etc).
The above code works because updating concurrently an array is allowed, as long as each thread updates an exclusive subset of its indices¹. After the completion of the Parallel.ForEach
operation, the current thread will see the output
array filled with fully initialized Foo
instances, without the need to insert manually a memory barrier. The TPL includes automatically memory barriers at the end of task executions (citation), and the Parallel.ForEach
is based on Task
s internally (hence the TPL acronym).
Collecting the results directly in a List<Foo>
is more involved, because the List<T>
collection is explicitly documented as not being thread-safe for concurrent write operations. You can update it safely using the lock
statement, as shown below:
List<Foo> output = new(source.Count);
for (int i = 0; i < source.Count; i++) output.Add(default);
Parallel.ForEach(source, options, (byteArray, state, index) =>
{
Foo foo = Deserialize(byteArray);
lock (output) output[(int)index] = foo;
});
Notice that the lock
protects only the updating of the output
list. The Deserialize
is not synchronized, otherwise the purpose of parallelization would be defeated.
Starting from .NET 8 it is possible to fill preemptively a List<T>
with uninitialized T
values without doing a loop, using the advanced CollectionsMarshal.SetCount
API:
List<Foo> output = new(source.Count); // Set the Capacity
CollectionsMarshal.SetCount(output, source.Count); // Set the Count
Alternative: It is even simpler if you are willing to switch from the Parallel.ForEach
to PLINQ. With a PLINQ query you can collect the results of a parallel operation without relying on side-effects. Just use the ToList
or the ToArray
operators at the end of the query:
List<Foo> output = source
.AsParallel()
.AsOrdered()
.WithDegreeOfParallelism(Environment.ProcessorCount)
.Select(byteArray => Deserialize(byteArray))
.ToList();
Don't forget to include the AsOrdered
operator, otherwise the order will not be preserved.
¹ Not documented explicitly, but generally agreed.
Select
orMap
to retain the input order. – MisshapeParallel.For
solves the problem. Post this as answer. I'll vote +1. – Hynes