Plinq's Range partitioning vs Chunk partitioning?
Asked Answered
U

2

5

In which scenarios Will Range partitioning will be better choice than Chunk partitioning ? (and vise verse)

I already know that

  • Chunk partitioning : grabbing smal chunks of elements from the input to process , he starts with small chunks then , increases the chunk size.

  • Range partitioning preallocates an equal number of elements to each worker

Also , Why this code : ( finding prime numbers till 100000)

IEnumerable<int> numbers = Enumerable.Range (3, 100000-3);
var parallelQuery =    from n in numbers.AsParallel()
                       where Enumerable.Range (2, (int) Math.Sqrt (n)).All (i => n % i > 0)
                       select n;

Might perform poorly with range partitioning ?

While this code : (finding the sum of sqrt of the first million numbers)

ParallelEnumerable.Range (1, 10000000).Sum (i => Math.Sqrt (i))

Will be better a better choice to use with range partitioning ?

Unmoving answered 1/9, 2012 at 15:27 Comment(0)
L
7

In the first sample , the required tme per item depends on n. Searching for the next prime number after 90000 takes more time than finding the one after 11.

As a result, when divided into equal ranges, the last ranges will have to perform much more work than the first.

In the second sample the time-per-operation is equal over the entire range. So range partitioning will work well.

Larena answered 1/9, 2012 at 15:41 Comment(6)
Hi @Henk thanks. So is there any thumb rule for me to decide when to use which ?Unmoving
When you expect an equal distribution, use a Range partition. When it's unknown (and there is a risk of an uneven distribution), using chunks will be more adaptive.Larena
Henk , behind the scenes , what difference does it makes if the distribution is equal or not ? both situations , there is a work to do....is there any thing which is done in Range and not in Chunk (or something else ? )?Unmoving
A Range partitioner assumes the workload-per-item is more or less equal and assigns worker threads accordingly. When the assumption turns about to be wrong, some Threads will still be working while others are idle. Work stealing won't be possible when the ranges are already assigned.Larena
in the second example I would not use range P', becouse then the work will be : (if u take 2 threads for work) : first thread will sum numbers from 0 to 500,000 and the second thread will sum the rest. the first thread have easier work becouse he have smaller numbers so he will finish his work much faster then the second thread and thats why he will not work while the second thread will be very bussy calculate the bigger numbers.Kirimia
@StavAlfi That was not a statement which Invented. It is a fact from Joseph albahari's book .hereUnmoving
L
2

A member of the PFX Team at Microsoft gives a good explanation on the PFX Dev Blog:

Partitioning in PLINQ

Based on many factors, we have 4 primary algorithms that we use for partitioning alone. They’re worth getting to know, because we’ll talk more about them and tweaks that we make to them in future technology discussions.

  1. Range Partitioning – This is a pretty common partitioning scheme, similar to the one that I described in the example above. This is amenable to many query shapes, though it only works with indexible data sources such as lists and arrays (i.e. IList and T[]). If you give PLINQ something typed as IEnumerable or IEnumerable, PLINQ will query for the ILIst interface, and if it’s found, will use that interface implementation with range partitioning. The benefits of these data sources is that we know the exact length and can access any of the elements within the arrays directly. For the majority of cases, there are large performance benefits to using this type of partitioning.

Range Partitioning

  1. Chunk Partitioning – This is a general purpose partitioning scheme that works for any data source, and is the main partitioning type for non-indexible data sources. In this scheme, worker threads request data, and it is served up to the thread in chunks. IEnumerables and IEnumerables do not have fixed Count properties (there is a LINQ extension method for this, but that is not the same), so there’s no way to know when or if the data source will enumerate completely. It could be 3 elements, it could be 3 million elements, it could be infinite. A single system needs to take all of these possibilities into account and factor in different delegate sizes, uneven delegate sizes, selectivity etc. The chunk partitioning algorithm is quite general and PLINQ’s algorithm had to be tuned for good performance on a wide range of queries. We’ve experimented with many different growth patterns and currently use a plan that doubles after a certain number of requests. This is subject to change as we tune for performance, so don’t depend on this. Another important optimization is that chunk partitioning balances the load among cores, as the tasks per core dynamically request more work as needed. This ensures that all cores are utilized throughout the query and can all cross the finish line at the same time vs. a ragged, sequential entry to the end.

Chunk Partitioning

  1. Striped Partitioning – This scheme is used for SkipWhile and TakeWhile and is optimized for processing items at the head of a data source (which obviously suits the needs of SkipWhile and TakeWhile). In striped partitioning, each of the n worker threads is allocated a small number of items (sometimes 1) from each block of n items. The set of items belonging to a single thread is called a ‘stripe’, hence the name. A useful feature of this scheme is that there is no inter-thread synchronization required as each worker thread can determine its data via simple arithmetic. This is really a special case of range partitioning and only works on arrays and types that implement IList.

Striped Partitioning

  1. Hash Partitioning – Hash partitioning is a special type of partitioning that is used by the query operators that must compare data elements (these operators are: Join, GroupJoin, GroupBy, Distinct, Except, Union, Intersect). When hash-partitioning occurs (which is just prior to any of the operators mentioned), all of the data is processed and channeled to threads such that items with identical hash-codes will be handled by the same thread. This hash-partitioning work is costly, but it means that all the comparison work can then be performed without further synchronization. Hash partitioning assigns every element to an output partition based on a hash computed from each element’s key. This can be an efficient way of building a hash-table on the fly concurrently, and can be used to accomplish partitioning and hashing for the hash join algorithm. The benefit is that PLINQ can now use the same hash partitioning scheme for the data source used for probing; this way all possible matches end up in the same partition, meaning less shared data and smaller hash table sizes (each partition has its own hash table). There’s a lot going on with hash-partitioning, so it’s not as speedy as the other types, especially when ordering is involved in the query. As a result the query operators that rely upon it have additional overheads compared to simpler operators.

Hash Partitioning

Landreth answered 20/10, 2021 at 18:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.