Java periodically hangs at futex and very low IO output
Asked Answered
A

2

19

Currently my application periodically blocked in IO , and the output is very low . I use some command to trace the process.

By using jstack i found that the app is hanging at FileOutputStream.writeBytes.

By using strace -f -c -p pid to collect syscall info, i found that. For normal situation, it has both futex and write syscalls. But when it went unnormal, there are only futex syscalls. The app keeps calling futex but all failed and throw ETIMEDOUT, just like this:

<futex resumed>  =-1 ETIMEDOUT (Connecton timed out)
futex(Ox7f823, FUTEX_WAKE_PRIVATE,1)=0
futex(Ox7f824, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME) =-1<unfinished>
<futex resumed>  =-1 ETIMEDOUT (Connecton timed out)
futex(Ox7f823, FUTEX_WAKE_PRIVATE,1)=0
futex(Ox7f824, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME) =-1<unfinished>

This issue happens periodically ,and continues for mintues or hours, and go normal again.

Escipally, when blocked in IO, echo 3 > /proc/sys/vm/drop_caches always makes it go normal temporarily. I googled it and found some similiar proleam, listing below.

  1. leap second. Doesn't work, our system's ntpd is stopped.
  2. transparent hugepage bug. https://bugzilla.redhat.com/show_bug.cgi?id=879801 This is very similar to my probleam, but my khugepaged process is normal, and the load is always nearly zero. Escipally drop_caches works for my application too. And my system is also multi core and large memory. It donsn't work for me. So anyone met the same probleam or familiar with this issue?

Some info about my system. OS:Redhat 6.1, kernal version 2.6.31

JDK:1.7.0_05

CPU:X5650, 24cores

Memory :24GB and 48GB

Alternation answered 28/8, 2015 at 3:35 Comment(5)
I am afraid JDK:1.7.0_05 is too old. You should try latest Java7 release. It is easiest first step.Foolery
@Alternation Seems like kernel problem, have you tried re-set the date of your system and try again? using something like this date -s "`date`" ?Roping
I used to try jdk 1.8, it seemed doesn't work,i'll have a detailed test. Also i found that when blocked, gc threads kept calling futex(), but failed. But from jstat -gcutil, the YGCT and FGCT was normal, only took serval seconds.Alternation
Can you check the paging I/O traffic (swapping) and the utilization of the block device? Use iostat -x -m -d 1, and perhaps also vmstat & top. It might be that the OS simply runs of RAM and start swapping to the same physical drive.Dig
Probably related to Linux futex_wait() bug... due to Commit b0c29f79ecea. The Red Hat platform and the kernel version look about right.Unsearchable
I
3

Maybe the kernel bug in futex_wait()?

You can read about it here: https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64

Inhibition answered 12/8, 2016 at 20:51 Comment(0)
W
1

In addition to clock jumps and aforementioned (rather old) THP kernel bug, another common reason for java to unexpectedly block on IO is reading very slow and blocking /dev/random which some libraries prefer over more commonly used and much better performing /dev/urandom.

Easy way to tell if that was the culprit:

sudo mv /dev/random /dev/random.real
sudo ln -s /dev/urandom /dev/random

...then restart the app and see if it stops IO blocking. Once done with the test, you probably want to restore /dev/random:

sudo mv /dev/random.real /dev/random

...and open a bug with application vendor asking to use /dev/urandom where appropriate.

Wapentake answered 7/5, 2019 at 12:18 Comment(2)
Thanks a million! This has helped me A LOT!!!Monticule
Right down the hall, in the Unix-department, there is an answer documenting the differences between /dev/random and /dev/urandom. On some occasions it is better to wait, than to use numbers, that can be predicted by a sophisticated adversary. To avoid blocking on Linux, set up the rngd-service.Fencesitter

© 2022 - 2024 — McMap. All rights reserved.