Surefire forkCount not resulting in this number of processes
Asked Answered
A

1

7

I can set the value of the parameter forkCount to any desired number, say 12, and I'd expect to have 12 new Java processes of type surefirebooter when running tests like these. But ps shows that I only sometimes get the 12 expected Java processes (to be precise: I get them extremely rarely). Instead I typically get less, sometimes even only three or four. Execution of my hundreds of unit tests also appears to be slow then.

The running processes also often disappear from the ps output (terminate, I assume) before the unit tests are done. In some cases all of them, then the execution hangs indefinitely.

Documentation wasn't too clear about this, but I'd expect to have the given number of processes all the time until all unit tests are done.

Maybe the surefirebooter processes run into some problem and terminate prematurely. I see no error message, though. Where should I see them? Can I switch on some debug logging? Switching on the debug mode of Surefire changed the test results, so I didn't follow that path very far.

I'm running ~1600 unit tests in ~400 classes which takes ~7 minutes in the best case. Execution time varies greatly, sometimes the whole thing terminates after more than an hour.

In some cases, on the other hand, the surefirebooter processes continue to run after execution finished (successfully) and puts massive load on the system (so it seems to be busy waiting for something).

Questions:

  • Can anybody explain these observed phenomena?
  • Can anybody give advice what to change in order to have a more proper execution? (I. e. with the desired number of surefirebooter processes at all times.)
  • Can anybody give advice on how to debug the situation? See messages about what happens with the surefirebooter processes? (I tried using strace but that also changed the behavior so dramatically that the call didn't terminate anymore.)
Acrilan answered 6/10, 2016 at 9:53 Comment(3)
What version of the Surefire Plugin are you using? forkCount is a maximum value, so it may be normal to have less sometimes (not sure), but the execution definitely shouldn't hang. Try to set <reuseForks>true</reuseForks>, that will make sure forks are created only once and reused throughout the test. What value are you using for parallel?Innovate
Surefire-version: 2.18.1. reuseForks: documented as being true on default. I will try out if I experience different results with it being explicitly set to true and report on the results here. parallel is not used; our classes under test are not designed to work correctly when their unit tests are run in parallel on their instances. Tests enabling this feature also showed the expected (failing) behavior.Acrilan
Explicitly setting reuseForks to its documented default true doesn't seem to have any influence from my trials.Acrilan
A
1

My hypothesis #1 would be that oom_killer can be the culprit. #2 would be that forked processes go into swap and/or spend crazy amount of time garbage collecting stuff

To debug:

  1. Which platform you run this on?
  2. If that's something of *nix kind, could you please check dmesg or /var/log/messages for the messages telling about killed processes after the run?
  3. In cases where you have processes busy waiting, could you please try a) collect stacktrace with jstack (both for forked processes and the main one) b) quantify massive load on the system in terms of cpu / memory usage / amount of stuff paged in / paged out
  4. If none of those proves useful, I'd try to fork surefire ForkStarter adding more logging events and comparing the logs of successful runs and failed ones for more clues. (--debug or -X argument to maven to output debug messages).
Aiello answered 10/10, 2016 at 16:26 Comment(3)
① I run this in a docker container which is on a Linux box. ② dmesg shows nothing, /var/log/messages does not exist, /var/log/syslog also shows nothing related. ③ ⓐ going to try to when it re-happens. ⓑ On this 64-CPU machine (!) a system load of ~25 appears and piles up to a multiple of it when several are left-over (a load of >400 could be observed once). I'm neither sure about the memory consumption nor about the actual CPU load. ④ I hoped to avoid this kind of deep-inspection :-/Acrilan
I'd also try to see if runs without docker work better. If it is, you're left with much smaller parameter space to search inAiello
Outside of the docker container I also observed most of the effects. Left-over processes happen too rarely to be found in manual executions, but a smaller number of surefirebooter processes than expected is often.Acrilan

© 2022 - 2024 — McMap. All rights reserved.