When to use Partitioner class?

Asked 27/10, 2010 at 9:44 Answered 4/3, 2017 at 8:0

Can anyone suggest typical scenarios where Partitioner class introduced in .NET 4.0 can/should be used?

Waldrup answered 27/10, 2010 at 9:44 Comment(0)

The Partitioner class is used to make parallel executions more chunky. If you have a lot of very small tasks to run in parallel the overhead of invoking delegates for each may be prohibitive. By using Partitioner, you can rearrange the workload into chunks and have each parallel invocation work on a slightly larger set. The class abstracts this feature and is able to partition based on the actual conditions of the dataset and available cores.

Example: Imagine you want to run a simple calculation like this in parallel.

Parallel.ForEach(Input, (value, loopState, index) => { Result[index] = value*Math.PI; });

That would invoke the delegate for each entry in Input. Doing so would add a bit of overhead to each. By using Partitioner we can do something like this

Parallel.ForEach(Partitioner.Create(0, Input.Length), range => {
   for (var index = range.Item1; index < range.Item2; index++) {
      Result[index] = Input[index]*Math.PI;
   }
});

This will reduce the number of invokes as each invoke will work on a larger set. In my experience this can boost performance significantly when parallelizing very simple operations.

Secondhand answered 27/10, 2010 at 9:54 Comment(3)

The default rangeSize is one for Partitioner.Create(). Thus, the partition is the same for both code examples. Unless Partitioner.Create(0, Input.Length, i); where i >1, it will still have the same number of thread. – Inevitable 24/1, 2014 at 15:31

@Pingpong, this does not seem to be correct, at least currently. I find the default partitioner creating 24 chunks on my 8 processor machine. This is much smaller than the length of the input. I would like to understand how it determines this default. – Seyler 2/7, 2019 at 22:51

Brian, can you give a rough estimate what "a lot of" means? Are 1000 tasks a problem? Or is 1000 ok, but 1.000.000 is worth using a partitioner? – Metsky 4/2, 2020 at 12:10

Range partition, as suggested by Brian Rasmussen, is one type of partitioning that should be used when the work is CPU intensive, tends to be small (relative to a virtual method call), many elements must be processed, and is mostly constant when it comes to run time per element.

The other type of partition that should be considered is chunk partitioning. This type of partitioning is also known as a load-balancing algorithm because a worker thread will rarely sit idle while there is more work to do - which is not the case for a range partition.

A chunk partition should be used when the work has some wait states, tends to require more processing per element, or each element can have significantly different work processing times.

One example of this might be reading into memory and processing of 100 files with vastly different sizes. A 1K file will be processed in much less time than a 1mb file. If a range partition is used for this, then some threads could sit idle for some time because they happened to process smaller files.

Unlike a range partition, there is no way to specify the number of elements to be processed by each task - unless you write your own custom partitioner. Another downside to using a chunk partition is that there may be some contention when it goes back to get another chunk since an exclusive lock is used at that point. So, clearly a chunk partition should not be used for short amounts of CPU intensive work.

The default chunk partitioner starts off with a chunk size of 1 element per chunk. After each thread processes three 1-element chunks, the chunk size is incremented to 2 elements per chunk. After three 2-element chunks have been processed by each thread, then the chunk size is incremented again to 3 elements per chunk, and so on. At least this is the way it works according to Dixin Yan, (see the Chunk partitioning section) who works for Microsoft.

By the way, the nice visualizer tool in his blog appears to be the Concurrency Visualizer profile tool. The docs for this tool claim that it can be used to locate performance bottlenecks, CPU under-utilization, thread contention, cross-core thread migration, synchronization delays, DirectX activity, areas of overlapped I/O, and other information. It provides graphical, tabular, and textual data views that show the relationships between the threads in an app and the system as a whole.

Other resources:

MSDN: Custom Partitioners for PLINQ and TPL

Part 5: Parallel Programming - Optimizing PLINQ by Joseph Albahari

Tamtama answered 4/3, 2017 at 8:0 Comment(0)

To parallelize an operation on a data source, one of the essential steps is to partition the source into multiple sections that can be accessed concurrently by multiple threads. PLINQ and the Task Parallel Library (TPL) provide default partitioners that work transparently when you write a parallel query or ForEach loop. For more advanced scenarios, you can plug in your own partitioner.

Recommended topics

Hot tags