The ultimate goal of thread pools and Fork/Join are alike: Both want to utilize the available CPU power the best they can for maximum throughput. Maximum throughput means that as many tasks as possible should be completed in a long period of time. What is needed to do that? (For the following we will assume that there is no shortage of calculation tasks: There is always enough to do for 100% CPU utilisation. Additionally I use "CPU" equivalently for cores or virtual cores in case of hyper-threading).
- At least there need to be as many threads running as there are CPUs available, because running less threads will leave a core unused.
- At maximum there must be as many threads running as there are CPUs available, because running more threads will create additional load for the Scheduler who assigns CPUs to the different threads which causes some CPU time to go to the scheduler rather than our computational task.
Thus we figured out that for maximum throughput we need to have the exact same number of threads than CPUs. In Oracle's blurring example you can both take a fixed size thread pool with the number of threads equal to the number of available CPUs or use a thread pool. It won't make a difference, you are right!
So when will you get into trouble with a thread pools? That is if a thread blocks, because your thread is waiting for another task to complete. Assume the following example:
class AbcAlgorithm implements Runnable {
public void run() {
Future<StepAResult> aFuture = threadPool.submit(new ATask());
StepBResult bResult = stepB();
StepAResult aResult = aFuture.get();
stepC(aResult, bResult);
}
}
What we see here is an algorithm that consists of three steps A, B and C. A and B can be performed independently of each other, but step C needs the result of step A AND B. What this algorithm does is submit task A to the threadpool and perform task b directly. After that the thread will wait for task A to be done as well and continue with step C. If A and B are completed at the same time, then everything is fine. But what if A takes longer than B? That may be because the nature of task A dictates it, but it may also be the case because there is not
thread for task A available in the beginning and task A needs to wait. (If there is only a single CPU available and thus your threadpool has only a single thread this will even cause a deadlock, but for now that is besides the point). The point is that the thread that just executed task B blocks the whole thread. Since we have the same number of threads as CPUs and one thread is blocked that means that one CPU is idle.
Fork/Join solves this problem: In the fork/join framework you'd write the same algorithm as follows:
class AbcAlgorithm implements Runnable {
public void run() {
ATask aTask = new ATask());
aTask.fork();
StepBResult bResult = stepB();
StepAResult aResult = aTask.join();
stepC(aResult, bResult);
}
}
Looks the same, does it not? However the clue is that aTask.join
will not block. Instead here is where work-stealing comes into play: The thread will look around for other tasks that have been forked in the past and will continue with those. First it checks whether the tasks it has forked itself have started processing. So if A has not been started by another thread yet, it will do A next, otherwise it will check the queue of other threads and steal their work. Once this other task of another thread has completed it will check whether A is completed now. If it is the above algorithm can call stepC
. Otherwise it will look for yet another task to steal. Thus fork/join pools can achieve 100% CPU utilisation, even in the face of blocking actions.
However there is a trap: Work-stealing is only possible for the join
call of ForkJoinTask
s. It cannot be done for external blocking actions like waiting for another thread or waiting for an I/O action. So what about that, waiting for I/O to complete is a common task? In this case if we could add an additional thread to Fork/Join pool that will be stopped again as soon as the blocking action has completed will be the second best thing to do. And the ForkJoinPool
can actually do just that if we are using ManagedBlocker
s.
Fibonacci
In the JavaDoc for RecursiveTask is an example for calculating Fibonacci numbers using Fork/Join. For a classic recursive solution see:
public static int fib(int n) {
if (n <= 1) {
return n;
}
return fib(n - 1) + fib(n - 2);
}
As is explained int the JavaDocs this is a pretty dump way to calculate fibonacci numbers, as this algorithm has O(2^n) complexity while simpler ways are possible. However this algorithm is very simple and easy to understand, so we stick with it. Let's assume we want to speed this up with Fork/Join. A naive implementation would look like this:
class Fibonacci extends RecursiveTask<Long> {
private final long n;
Fibonacci(long n) {
this.n = n;
}
public Long compute() {
if (n <= 1) {
return n;
}
Fibonacci f1 = new Fibonacci(n - 1);
f1.fork();
Fibonacci f2 = new Fibonacci(n - 2);
return f2.compute() + f1.join();
}
}
The steps that this Task is split into are way too short and thus this will perform horribly, but you can see how the framework generally works very well: The two summands can be calculated independently, but then we need both of them to build the final result. So one half is done in an other thread. Have fun doing the same with thread pools without getting a deadlock (possible, but not nearly as simple).
Just for completeness: If you'd actually want to calculate Fibonacci numbers using this recursive approach here is an optimized version:
class FibonacciBigSubtasks extends RecursiveTask<Long> {
private final long n;
FibonacciBigSubtasks(long n) {
this.n = n;
}
public Long compute() {
return fib(n);
}
private long fib(long n) {
if (n <= 1) {
return 1;
}
if (n > 10 && getSurplusQueuedTaskCount() < 2) {
final FibonacciBigSubtasks f1 = new FibonacciBigSubtasks(n - 1);
final FibonacciBigSubtasks f2 = new FibonacciBigSubtasks(n - 2);
f1.fork();
return f2.compute() + f1.join();
} else {
return fib(n - 1) + fib(n - 2);
}
}
}
This keeps the subtasks much smaller because they are only split when n > 10 && getSurplusQueuedTaskCount() < 2
is true, which means that there are significantly more than 100 method calls to do (n > 10
) and there are not very man tasks already waiting (getSurplusQueuedTaskCount() < 2
).
On my computer (4 core (8 when counting Hyper-threading), Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz) the fib(50)
takes 64 seconds with the classic approach and just 18 seconds with the Fork/Join approach which is quite a noticeable gain, although not as much as theoretically possible.
Summary
- Yes, in your example Fork/Join has no advantage over classic thread pools.
- Fork/Join can drastically improve performance when blocking is involved
- Fork/Join circumvents some deadlock problems