How to prevent consistent java pause pattern on Linux Mint
Asked Answered
R

0

5

I have a Java app running on Linux Mint. EVERY minute, the program shows a very noticeable slow down -- A pause. The pause is a consistent 3 to 4 seconds. When we run further instances of the same program, they also pause 3 to 4 seconds each minute. Each program stops on a different second of the minute.

latest update:

After the last update (below) increasing the thread pool's thread count saw the GUI problem go away. After running for around ~40 hours we observed a thread leak in the Jetty HttpClient blocking-GET (Request.send()) call. To explain the mechanics, using the Executor class: a main thread runs every few minutes. It uses Executor to run an independent thread to call the host with a HTTP GET command, Jetty's HttpClient.request.send().

After about 40 hours of operation, there was a spike on the number of threads running in the HttpClient pool. So for 40 hours, the same threads ran fine. The working hypothesis is that around that time, one or more send() calls did not complete or time-out and have not returned to the calling thread. Essentially this/these threads are hung inside the Jetty Client.

When watching each regular cycle in jVisualMV we see the normal behaviour each cycle; some HttpClient threads fire up for the host GET, execute and go-away in just a few seconds. Also on the monitor are about 10 thread belonging to the Jetty HttpClient thread pool that have been 'present' for (now) 10 hours.

The expectation is that there was some error in underlying client or network processing. I am surprised there was no time-out exception or programming exception. There are some clear question I can ask now.

  1. What can happen inside HttpClient that could just hang a Request.send()
  2. What is the time-out on the call return? I would think there will still be absolute time-outs or checks for locking, etc (no?)
  3. Can the I/O system hang and leave the caller-thread hanging -- While Java obediently ...
    • Fires the manager thread at the scheduled time, then
    • The next Http.Request.send() happens,
    • A new thread(s) from the pool run-up for the next send (as appears to have happened).
    • While the earlier send() is stuck in limbo
  4. Can I limit or other wise put a clean-up on these stuck threads?

This was happening before we increased the thread pool size. What's happened is that the 'blame' has become more focused on the problem area. also we are suspicious of the underlying system because we also had lock-ups with Apache HttpClient again around the same (non-specific) time of day.

(prior update) ...

The pause behaviour observed is the JavaFX GUI does not update/refresh; the display's clock (textView), setText() call was logged during the freeze with two x updates per second (that's new information). The clock doesn't update (on Mint Linux), it continues to update when running on Windows. To forestall me repeating myself for questions about GC, logs, probes, etc. the answer will be the same; we have run extensive diagnostics over weeks now. The issue is unmistakably a mix of: Linux JVM / Linux Mint / Threads (per JavaFX). Other piece of new data is that increasing the thread-pool count by +2, appears to remove the freeze -- Further testing is needed to confirm that and tune the numbers. The question though is "What are the parameters that make the difference between the two platforms?"

We have run several instances of the program on Windows for days with no pauses. When we run on a Mint Linux platform we see the freeze, it is very consistent.

The program has several running threads running on a schedule. One thread opens the internet for an http socket. When we comment out that area, the pause vanishes. However we don't see that behaviour using Windows. Experiments point to something specific to the Mint networking I/O subsystem, linux scheduling, the Linux Java 8 JVM, or some interaction between the two.

As you may guess, we are tearing our hair out on this one. For example, we turned off logging and the pause remained. We resumed logging and just did one call to the http server, pause every 60 seconds, on the same second count. This happens even when we do no other processing. We tried different http libraries, etc. Seems very clear it is in the JVM or Linux.

Does anyone know of a way to resolve this?

Related answered 16/8, 2014 at 9:30 Comment(9)
Have you monitored garbage collection? It could be the culprit.Ashien
Everything, including garbage collection, has been monitored. We are very certain the problem is between the JVM, I/O and/or Linux. The distinction appears to be the way ThreadPool-s are implemented for the Linux JVM. As I said, runs fine on Windows, pause on Linux. Experiments with the size of the thread pool are demonstrating some improvement. We are not sure why they are different; if that's actually the issue or if the pool size hides the root-cause.Related
What happens during a "pause"? Can other threads make progress? Or are they "frozen" (as you say) or merely slowed down? What behavior of the system are you observing that tells you that a pause has occurred? What is the system's CPU and I/O activity during the pause?Polymath
Have you run VisualVM against it and seen what's runnable when it pauses?Tempo
VisualVM just showed threads running as expected. GC is not the issue. The threads keep working, the JavaFX GUI, does not update until 3 - 5 seconds as described. There's a clock updating every 0.5 seconds. We did System.out.println() on the clock update setText(). SysOut shows other threads working as intended, screen is frozen (paused).Related
@Related The main differences I can think of w.r.t JavaFX are: (i) threads are handled differently on Windows and Linux and you may have a thread safety issue that only shows up on one (maybe you are doing FX stuff outside the FX thread?) (ii) the actual display is done by a native library which differs between windows and linux.Ashien
@Ashien ... On the first point, JavaFX throws an exception when you do something JavaFX related on a different thread. So we have been able to eliminate those bugs -- Also there are Try/Catch blocks on all the threads to detect anything like that.Related
Similar issue: #12741241Related
I saw a similar problem where the http communication got hanged. It turned out the be log rotation (i.e. zip and rotate the log file) that "paused" the entire process.Mcilwain

© 2022 - 2024 — McMap. All rights reserved.