Multiple threads stuck in native calls (Java)
Asked Answered
C

5

7

I have a problem with an application running on Fedora Core 6 with JDK 1.5.0_08.

After some amount of uptime (usually some days) threads begin getting stuck in native methods.

The threads are locked in something like this:

"pool-2-thread-2571" prio=1 tid=0x08dd0b28 nid=0x319e waiting for monitor entry [0xb91fe000..0xb91ff7d4]
at java.lang.Class.getDeclaredConstructors0(Native Method)

or

"pool-2-thread-2547" prio=1 tid=0x75641620 nid=0x1745 waiting for monitor entry [0xbc7fe000..0xbc7ff554]
at sun.misc.Unsafe.defineClass(Native Method)

Especially puzzling to me is this one:

"HealthMonitor-10" daemon prio=1 tid=0x0868d1c0 nid=0x2b72 waiting for monitor entry [0xbe5ff000..0xbe5ff4d4]
at java.lang.Thread.dumpThreads(Native Method)
at java.lang.Thread.getStackTrace(Thread.java:1383)

The threads remain stuck until the VM is restarted.

Can anyone give me an idea as to what is happening here, what might be causing the native methods to block? The monitor entry address range at the top of each stuck thread is different. How can I figure out what is holding this monitor?

Cultivator answered 1/9, 2008 at 7:1 Comment(0)
B
5

My initial suspicion would be that you are experiencing some sort of class-loader realted dead lock. I imagine, that class loading needs to be synchronized at some level because class information will become available for the entire VM, not just the thread where it was initially loaded.

The fact that the methods on top of the stack are native methods seems to be pure coincidence, since part of the class loading mechanism happens to implemented that way.

I would investigate further what is going on class-loading wise. Maybe some thread uses the class loader to load a class from a network location which is slow/unavailable and thus blocks for a really long time, not yielding the monitor to other threads that want to load a class. Investigating the output when starting the JVM with -verbose:class might be one thing to try.

Bignonia answered 30/9, 2008 at 10:50 Comment(0)
S
2

I was having similar problems a few months ago and found the jthread(?) utility to be invaluable. You give it the process ID for your Java application and it will dump the entire stack for each thread in your process.

From the output of jthread, I could see one thread was trying to obtain a lock after having entered a monitor and another thread was trying to enter the monitor after obtaining the lock. A recipe for deadlock.

I was also wondering if your application was running into a garbage collection issue. You say it runs for a couple days before it stops like this. How long have you let it sit in the stuck state to see if maybe the GC ever finishes?

Skean answered 1/10, 2008 at 5:50 Comment(1)
I think the name of the tool is jstack; I used it for thread info above. Also, sending a SIGQUIT (ctrl-\) to the JVM dumps the threads to stdout. My problem is that it seems that the deadlock is happening in code beyond my control (native code). Though I've no doubt I'm holding the lock somehow...Cultivator
B
1

Can you find out which thread is actually synchronizing on the monitor on which the native method is waiting? At least the thread-dump you get from the VM when you send it a SIGQUIT (kill -3) should show this information, as in

"Thread-0" prio=5 tid=0x0100b060 nid=0x84c000 waiting for monitor entry [0xb0c8a000..0xb0c8ad90]
    at Deadlock$1.run(Deadlock.java:8)
    - waiting to lock <0x255e5b38> (a java.lang.Object)
...
"main" prio=5 tid=0x01001350 nid=0xb0801000 waiting on condition [0xb07ff000..0xb0800148]
    at java.lang.Thread.sleep(Native Method)
    at Deadlock.main(Deadlock.java:21)
- locked <0x255e5b38> (a java.lang.Object)

In the dumps you've posted so far, I can't see any thread that is actually waiting to lock a specific monitor...

Bignonia answered 7/10, 2008 at 10:19 Comment(2)
I think you are right about it being classloader related. Though the threads always block on native methods so I don't think that's a coincidence. There are no classes that might take a long time to load (or be unavailable). The classpath includes only local jars. Actually there is more than one classloader, each of which might be contending for some resource and becoming blocked as a result. Since the use of Groovy was removed from the application (it caused a memory leak in PermGen), we haven't seen any additional cases of the block. So this might be what was causing the problem.Cultivator
Unfortunately, there is no obviously deadlocking monitor. None of the values identifying monitors are repeated in a thread dump. If there were I would have noticed: I think the JVM reports suspected deadlocked threads, and there is also the [Thread Dump Analyzer][1] visualvm plugin that helps find problems in thread dumps.Cultivator
F
0

Maybe you should use another jdk version.
For your "puzzling one", there is a bug entry for 1.5.0_08. A memory leak is reported (I do not know, if this is related to your problem):
https://bugs.java.com/bugdatabase/view_bug?bug_id=6469701

Also you could get the source code and look, what happens at line 1383. On the other side, it could just be the stack dump, after the original error occurred.

Forseti answered 1/9, 2008 at 19:22 Comment(1)
Unless I can prove that the problem is caused by a bug in the JDK version which was fixed later, upgrading the JDK is not an option. I'll try looking up the source code tomorrow -- hopefully that might give me some insight into the problem. Though I do recall having a "mixed native/Java" thread dump which showed the JDK native stack traces and not being able to see anything there... I'm running on Linux, so it doesn't look like the the bug you linked to is relevant.Cultivator
I
0

I found this thread after hitting the same problem - JDK 1.6.0_23 running on Linux with Tomcat 6.0.29. Not sure those bits are relevant, though - what I did notice was that aside from many threads getting "stuck" in the getDeclaredConstructors() native method, the CPU was at 100% for the java process. So, all request threads getting stuck here, CPU at 100%, thread dumps not showing any deadlocks (and no other threads doing any significant activity), it smelled like a thrashing garbage collector to me. Sure enough, checked the server logs and there were numerous OutOfMemory errors - heap space was exhausted.

Can't say that this is going to be the root cause of threads getting stuck here every time, but hopefully the info here will help others at least rule out this as a possible cause...

Igloo answered 15/2, 2011 at 15:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.