Surefire forkCount not resulting in this number of processes

I can set the value of the parameter forkCount to any desired number, say 12, and I'd expect to have 12 new Java processes of type surefirebooter when running tests like these. But ps shows that I only sometimes get the 12 expected Java processes (to be precise: I get them extremely rarely). Instead I typically get less, sometimes even only three or four. Execution of my hundreds of unit tests also appears to be slow then.

The running processes also often disappear from the ps output (terminate, I assume) before the unit tests are done. In some cases all of them, then the execution hangs indefinitely.

Documentation wasn't too clear about this, but I'd expect to have the given number of processes all the time until all unit tests are done.

Maybe the surefirebooter processes run into some problem and terminate prematurely. I see no error message, though. Where should I see them? Can I switch on some debug logging? Switching on the debug mode of Surefire changed the test results, so I didn't follow that path very far.

I'm running ~1600 unit tests in ~400 classes which takes ~7 minutes in the best case. Execution time varies greatly, sometimes the whole thing terminates after more than an hour.

In some cases, on the other hand, the surefirebooter processes continue to run after execution finished (successfully) and puts massive load on the system (so it seems to be busy waiting for something).

Questions:

Can anybody explain these observed phenomena?
Can anybody give advice what to change in order to have a more proper execution? (I. e. with the desired number of surefirebooter processes at all times.)
Can anybody give advice on how to debug the situation? See messages about what happens with the surefirebooter processes? (I tried using strace but that also changed the behavior so dramatically that the call didn't terminate anymore.)

My hypothesis #1 would be that oom_killer can be the culprit. #2 would be that forked processes go into swap and/or spend crazy amount of time garbage collecting stuff

To debug:

Which platform you run this on?
If that's something of *nix kind, could you please check dmesg or /var/log/messages for the messages telling about killed processes after the run?
In cases where you have processes busy waiting, could you please try a) collect stacktrace with jstack (both for forked processes and the main one) b) quantify massive load on the system in terms of cpu / memory usage / amount of stuff paged in / paged out
If none of those proves useful, I'd try to fork surefire ForkStarter adding more logging events and comparing the logs of successful runs and failed ones for more clues. (--debug or -X argument to maven to output debug messages).

Recommended topics

Hot tags