On my busiest production installation, on occasion I get a single thread that seems to get stuck in an infinite loop. I've not managed to figure out who is the culprit, after much research and debugging, but it seems like it should be possible. Here are the gory details:
Current debugging notes:
1) ps -eL 18975 shows me the the Linux pid the problem child thread, 19269
$ps -eL | grep 18975
...
PID LWP TTY TIME CMD
18975 18994 ? 00:00:05 java
18975 19268 ? 00:00:00 java
18975 19269 ? 05:16:49 java
18975 19271 ? 00:01:22 java
18975 19273 ? 00:00:00 java
...
2) jstack -l 18975 says there are no deadlocks, jstack -m 18975 does not work
3) jstack -l 18975 does give me the stack trace for all my threads (~400). Example thread stack (and not the problem):
"http-342.877.573.944-8080-360" daemon prio=10 tid=0x0000002adaba9c00 nid=0x754c in Object.wait() [0x00000000595bc000..0x00000000595bccb0] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on (a org.apache.tomcat.util.net.JIoEndpoint$Worker) at java.lang.Object.wait(Object.java:485) at org.apache.tomcat.util.net.JIoEndpoint$Worker.await(JIoEndpoint.java:416) - locked (a org.apache.tomcat.util.net.JIoEndpoint$Worker) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:442) at java.lang.Thread.run(Thread.java:619)
4) The ps -eL output's thread ID does not match the output from jstack, or at least I cannot see it. (jstack documentation is a bit sparse.)
5) There are no heavy IO, memory usage or other corresponding activity clues to work with.
Platform:
- Java 6
- Tomcat 6
- RHEL 4 (64-bit)
Does anybody know how I can make that connection from the linux ps output to my problem child java thread? So close, yet so far...
hexNid2dec(pid)
math error. Or maybe was doing something else incredible stupid. – Maggot