The Parallel.For loop uses ThreadPool to execute work in a loop by invoking a delegate once per each iteration of a loop.
The general idea of how Parallel.For works can be presented as follows:
public static void MyParallelFor(int inclusiveLowerBound, int exclusiveUpperBound, Action<int> body)
{
// Get the number of processors, initialize the number of remaining
// threads, and set the starting point for the iteration.
int numProcs = Environment.ProcessorCount;
int remainingWorkItems = numProcs;
int nextIteration = inclusiveLowerBound;
using (ManualResetEvent mre = new ManualResetEvent(false))
{
// Create each of the work items.
for (int p = 0; p < numProcs; p++)
{
ThreadPool.QueueUserWorkItem(delegate
{
int index;
while ((index = Interlocked.Increment(ref nextIteration) - 1) < exclusiveUpperBound)
body(index);
if (Interlocked.Decrement(ref remainingWorkItems) == 0)
mre.Set();
});
}
// Wait for all threads to complete.
mre.WaitOne();
}
}
Parallel.For returns ParallelLoopResult value type, which contains details on the completed loop. One of its overloads is as follows:
public static ParallelLoopResult For(int fromInclusive, int toExclusive, Action<int> body);
It's important to realize that parallel execution is not always faster than serial execution. To decide whether to use parallel or not you have to estimate the workload that will do per iteration of a loop. If the actual work being performed by the loop is small relatively to thread synchronization cost, it's better to use ordinary loop.
This is one of example when serial for loop performance is faster that parallel:
static void Main(string[] args)
{
Action<int> action = new Action<int>(SimpleMethod);
// ordinary For loop performance estimation
var sw = Stopwatch.StartNew();
for(int i = 0; i < 1000; i++)
action(i);
Console.WriteLine("{0} sec.", sw.Elapsed.TotalSeconds);
// parallel For loop performance estimation
sw = Stopwatch.StartNew();
Parallel.For(0, 1000, action);
Console.WriteLine("{0} sec.", sw.Elapsed.TotalSeconds);
}
static void SimpleMethod(int index)
{
int d = 1;
int result = index / d;
}
Output:
0.0001963 sec.
0.0346729 sec.