ForkJoinPool stalls during invokeAll/join
Asked Answered
G

3

8

I try to use a ForkJoinPool to parallelize my CPU intensive calculations. My understanding of a ForkJoinPool is, that it continues to work as long as any task is available to be executed. Unfortunately I frequently observed worker threads idling/waiting, thus not all CPU are kept busy. Sometimes I even observed additional worker threads.

I did not expect this, as I strictly tried to use non blocking tasks. My observation is very similar to those of ForkJoinPool seems to waste a thread. After debugging a lot into ForkJoinPool I have a guess:

I used invokeAll() to distribute work over a list of subtasks. After invokeAll() finished to execute the first task itself it starts joining the other ones. This works fine, until the next task to join is on top of the executing queue. Unfortunately I submitted additional tasks asynchronously without joining them. I expected the ForkJoin framework to continue executing those task first and than turn back to joining any remaining tasks.

But it seems not to work this way. Instead the worker thread gets stalled calling wait() until the task waiting for gets ready (presumably executed by an other worker thread). I did not verify this, but it seems to be a general flaw of calling join().

ForkJoinPool provides an asyncMode, but this is a global parameter and can not be used for individual submissions. But I like to see my asynchronously forked tasks to be executed soon.

So, why does ForkJoinTask.doJoin() not simply executes any available task on top of its queue until it gets ready (either executed by itself or stolen by others)?

Gladysglagolitic answered 3/6, 2013 at 10:44 Comment(2)
Can you share some code with us? Sometime 3 lines of code say more than 30 lines of prose.Cirro
I put something here: pastebin.com/kgHuJZMMGladysglagolitic
G
4

Since nobody else seems to understand my question I try to explain what I found after some nights of debugging:

The current implementation of ForkJoinTasks works well if all fork/join calls are strictly paired. Illustrating a fork by an opening bracket and join by a closing one a perfect binary fork join pattern may look like this:

{([][]) ([][])} {([][]) ([][])}

If you use invokeAll() you may also submit list of subtasks like this:

{([][][][]) ([][][][]) ([][][][])}

What I did however looks like this pattern:

{([) ([)} ... ]]

You may argue this looks ill or is a misuse of the fork-join framework. But the only constraint is, that the tasks completion dependencies are acyclic, else you may run into a deadlock. As long as my [] tasks are not dependent on the () tasks, I don't see any problem with it. The offending ]]'s just express that I do not wait for them explicitly; they may finish some day, it does not matter to me (at that point).

Indeed the current implementation is able to execute my interlocked tasks, but only by spawning additional helper threads which is quite inefficient.

The flaw seems to be the current implementation of join(): joining an ) expects to see its corresponding ( on top of its execution queue, but it finds a [ and is perplexed. Instead of simply executing [] to get rid of it, the current thread suspends (calling wait()) until someone else comes around to execute the unexpected task. This causes a drastic performance break down.

My primary intend was to put additional work onto the queue to prevent the worker thread from suspending if the queue runs empty. Unfortunately the opposite happens :-(

Gladysglagolitic answered 6/6, 2013 at 17:28 Comment(0)
W
3

You are dead right about join(). I wrote this article two years ago that points out the problem with join().

As I said there, the framework cannot execute newly submitted requests until it finishes the earlier ones. And each WorkThread cannot steal until it's current request finishes which results in the wait().

The additional threads you see are "continuation threads." Since join() eventually issues a wait(), these threads are needed so the entire framework doesn't stall.

Waxen answered 3/6, 2013 at 13:51 Comment(7)
yes, I read your article and in deed this lead me to my conclusion. I agree with you that the F/J Framework is unable to handle tasks which are blocked by external resources. But I still do not understand why the WorkerThread cannot execute any other work available, either stolen from any tail or taken from its own top, until the task waiting for gets ready.Gladysglagolitic
Accountability. If there is an exception in another request's task then there may be no way to find the owner. The pool only uses one UncaughtExceptionHandler. You can look at the code yourself. It's a tangled mess but you may be able to trace it through.Waxen
Not convinced: the same problem arises if executing a task stolen from another queue. The current FJTask is responsible to catch those and to call this.setExceptionalCompletion() instead of passing them to the pools UncaughtExceptionHandler. The question remains: why not executing pending tasks from the head just like executing tasks stolen from the tail.Gladysglagolitic
Perhaps I"m not understanding your problem. Post the code so we all know for sure what you are talking about.Waxen
This framework is for walking down the leaves of a balanced tree. You're not doing what is acceptable for this limited framework. Any result you get has no basis in reality. There are many examples of how to properly use this framework on the net.Waxen
That is indeed what I found: the framework gets confused by my usage and stalls. See my own answer...Gladysglagolitic
@Waxen Your link to your article is dead.Bessel
W
2

You’re not using this framework for the very narrow purpose for which it was intended.

The framework started life as the experiment in the 2000 research paper. It’s been modified since then but the basic design, fork-and-join on large arrays, remains the same. The basic purpose is to teach undergraduates how to walk down the leaves of a balanced tree. When people use it for other than simple array-processing weird things happen. What it is doing in Java7 is beyond me; which is the purpose of the article.

The problems only get worse in Java8. There it’s the engine to drive all stream parallel work. Have a read in part two of that article. The lambda interest lists are filled with reports of thread stalls, stack overflow, and out of memory errors.

You use it at your own risk when you don’t use it for pure recursive decomposition of large data structures. Even then, the excessive threads it creates can cause havoc. I’m not going to pursue this discussion any further.

Waxen answered 7/6, 2013 at 13:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.