How to define a good partition plan to ensure CPU balance in JSR 352?
Asked Answered
B

1

6

JSR 352 - Batch Applications for the Java Platform provides parallelism feature using partitions. Batch runtime can execute a step in different partitions in order to accelerate the progress. JSR 352 also introduces the threads definition : we can define the number of threads to use, such as

<step id="Step1">
    <chunk .../>
        <partition>
            <plan partitions="3" threads="2"/>
        </partition>
    </chunk>
</step>

Then I feel confused : how to give an appreciated partition plan so that each thread is occupied and ensure the CPU balance ?

For example, there're table A, B, C to do and their rows are respectively 1 billion, 1 million, 1 thousand. The step aims to process these entities to documents, one entity go to one document. The order of document production is not important. The CPU time for these tables' entity is respectively 1s, 2s, 5s. The threads number is 4.

If there're 3 partitions, one per table type, then the step will take 1 * 10^9 seconds to finish, because :

  • Partition A will take 1 * 10^9 * 1s = 1 * 10^9s, run on thread 2
  • Partition B will take 1 * 10^6 * 2s = 2 * 10^6s, run on thread 3
  • Partition C will take 1 * 10^3 * 5s = 5 * 10^3s, run on thread 4

However, while the thread 2 is occupied, thread 3 is free since 2 * 10^6s and thread 4 is free since 5 * 10^3s. So obviously, this is not a good partition plan.

My questions are :

  • Is there a better partition plan to complete in the above example ?
  • Can I consider : partitions is a queue to consume and threads consume this queue ?
  • In general, how many threads can I / should I use ? Is that the same number of the CPU cores ?
  • In general, how to give an appreciated partition plan so that each thread is occupied and ensure CPU balance ?
Betaine answered 30/7, 2016 at 15:51 Comment(2)
Typically in a partitioned step you are running the same exact logic across each partition. Let me first ask, for your "table types" A, B, and C, are you envisioning these as so similar that they would fit well into a single step using the same logic to read/process/write for all three? (If not, it might be a better fit to break them into three sequential steps, some of which could be partitioned, and/or possibly a split to run different steps concurrently).Idette
Hi @ScottKurz, yes, they are well fit for running in the same step. Actually, my item processor converts a JPA model into a Lucene document for fulltext search feature. This is the same exact logic for table A, B and C.Betaine
F
3

Answers...

Is there a better partition plan to complete in the above example?

Yes, there is. See answer 4...

Can I consider : partitions is a queue to consume and threads consume this queue ?

That is what exactly happens!

In general, how many threads can I / should I use ? Is that the same number of the CPU cores ?

It depends. This question has many perspectives... From the JSR-352 Specification View, "threads":

Specifies the maximum number of threads on which to execute the partitions of this step. Note the batch runtime cannot guarantee the requested number of threads are available; it will use as many as it can up to the requested maximum. This is an optional attribute. The default is the number of partitions.

So, based only in this perspective, you should set this value as high as you want (the batch runtime will set the real limit, according to its resources!).

From the Batch Runtime Perspective (JSR352 Implementation): Any decent implementation will use a thread pool to execute the partitioned steps. So, if such pool has a fixed size of N, no matter how big you set your threads number, you will never execute more than N partitions concurrently.

JBeret is an implementation of JSR352 specification, used by wildfly server (It is the implementation that I've used). At Wildfly, it has a default thread pool setting of max 10 threads. This pool is not only shared between partitioned steps, it is also shared between batch jobs. So, if you're running 2 jobs at the same time, you will have 2 thread less for use. Additional to this fact, when you partition, one thread takes the role of coordinator, assigning partitions to the others threads and waiting for results ... so if your partition plan says that it uses 2 threads, it will in fact uses 3! (two as workers, one as coordinator)... and all this resources (threads) are taken from the same pool!!

Anyway, the important thing of all this is: investigate what implementation of JSR325 are you using and setup it accordingly.

From hardware View, your CPU has a thread max limit. Under this perspective (and as rule of thumb), set the "threads" value equals to such value.

From the Performance View, analyze the work that are you doing. If you're accessing a shared resource (like a DB) between many threads, you can produce a bottleneck causing thread blocking. If you face that kind of problem, you must think at lowering the "theads" value.

In Summary, set the "threads" value as high as the CPU max thread limit. Then, check if that value does not cause blocking issues; if it does, reduce the value. Also, verify it the batch runtime is configured accordingly and it allows to you execute as many threads as you desire.

In general, how to give an appreciated partition plan so that each thread is occupied and ensure CPU balance ?

Avoid the use of static partition plans (at least for you case). Instead, use a Partition Mapper. A Partition Mapper is a class that implements the javax.batch.api.partition.PartitionMapper interface and allows to define a partition plan (how many partitions, how many threads, the properties of each partition) programatically. So for your case, take your tables (A, B, C) and split them into blocks of N (where N = 1000) ... each block will be a partition. You should start with the partition of type C and do round robin between your entity partitions (tables): C0, B0, A0, B1, A1, ..., B999, A999, A1000, ..., A999999 ... using this scheme, entity C will finish first, leaving one thread open to resolve more A and B partitions. Later, B will finish, leaving more resources to attack the remaining A partitions.

Hope this help...

Fascination answered 1/8, 2016 at 21:32 Comment(5)
My tests use JBeret as implementation, but my framework itself allows user to choose their own implementation (the batch runtime embedded inside the target Java EE container). So I’ll take a look at the thread pool setting for all the implementations. You’ve mentioned that the concurrency of multiple jobs and the performance over different views, that’s precious. The solution you’ve proposed in the summary is great, I’ll accept it. Thanks for this very detailed and helpful answer @CarlitosWay.Betaine
I disagree with your advice re the number of threads. First, there is no "CPU thread limit", second, during processing most tasks will have to wait more or less often, e.g. for the DB to return results. Especially in this case it makes sense to have (much) more threads than CPUs/cores so that while one thread has to wait for e.g. DB data another thread can use the CPU. In the end, some fiddling (tuning) will need to be done to find the sweet spot where increasing the number of threads does not reduce (but maybe begin to increase) overall run time due to contention/overhead.Kannry
@JimmyB, any processor (Intel, AMD) has a maximum number of threads that can run simultaneously (let's called it, X) . That number is what I called "CPU Thread Limit". Even if you configure your application to use a 1000 threads, only X will run simultaneously. Am I wrong about this? Anyway, to use X as value is a just heuristic, a starting point. Only profiling your application (with real data) will reveal the best value...Fascination
@CarlitosWay I already figured you probably meant CPUs/cores. The number of cores is a starting point. But it's by no means the maximum number of threads you should use/attempt.Kannry
As I got, since my 4-core processor runs two threads per core (meaning I have 8 logical processors) I set the number of threads for the batch job to 8 which in fact is actually 9! (8 workers, one coordinator). Cerise sur le gateau, 9 < WildFly's max-threads of 10!Cervantez

© 2022 - 2024 — McMap. All rights reserved.