Java VM: reproducible SIGSEGV on both 1.6.0_17 and 1.6.0_18, how to report?
Asked Answered
S

4

11

EDIT: This reproducible SIGSEGV happens on a Linux machine with more than one proc and more than 2GB of mem, so Java is defaulting to the -server mode. Interestingly enough if I force "-client" there's no crash anymore... (I'm still not too sure what to do with my reproducible SIGSEGV but it's interesting nonetheless).

First note that this is a bit related but not identical to the following because in our case it's only a SIGSEGV that happens, and we can reliably trigger it:

JVM OutOfMemory error "death spiral" (not memory leak)

It's related because it happens when we feed our app with a "deluge of data": data are coming from text files and then number-crunched (yes, financial number crunching in Java).

I can reliably trigger a JVM to SIGSEGV using only valid Java code.

NOTE: I can invariably crash both JVM 1.6.0_17 adn JVM 1.6.0_18 and this question is not about how to workaround this issue (for example playing with VM parameters may fix the issue but I'm not after that, I want to know what to do with this always-reproducable SIGSEGV).

I've got a workaround which simply consists in using Java 1.5 when launching our app (while still using Java 1.6 to run IntelliJ IDEA, etc. on the same machine, simultaneously), but my question is if this should be reported or not and, if it should, how to report it knowing that the log itself contains proprietary information (the full hs_err_..._log).

Hardware error can be ruled out for:

  • this is happening on a workstation that regularly reaches months of uptime (I only reboot it when critical security patches affecting my trimmed down and hardened Debian Linux are issued, which really doesn't happen often) and on which applications never crash (making it very unlikely that it's an hardware issue on that machine [more below])

  • same application works perfectly on that same machine under a JVM 1.5 under the same load (this is how I'm testing the app: I simply launch it under a 1.5 VM)

  • same application works perfectly fine on more than one hundreds clients machine under the same (gigantic) load (never crashed once on Windows + JVM 1.5 or 1.6 and never crashed once on OS X + JVM 1.5 or 1.6 [a crash would mean an instant phone call from the client])

  • other application on that same machine and same 1.6.0_17 or 1.6.0_18 JVM never crash (for example I've got two instances of IntelliJ IDEA running as two different users on that same machine and they don't crash)

  • machine is tested with memtest "regularly" (before installing a new OS, which last happened when I installed Debian Lenny, not that long ago)

Here's the reproducible-on-demand SIGSEGV:

... $uname -a
Linux saturn 2.6.26-2-686 #1 SMP Wed Nov 4 20:45:37 UTC 2009 i686 GNU/Linux
... $ export /home/wizard/jdk1.6.0_17/bin:$PATH
... $ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)

Launch the app, feed it a "deluge of data", wait a few seconds...

Then, invariably, for 1.6.0_17:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb76d0080, pid=30793, tid=2514328464
#
# JRE version: 6.0_17-b04
# Java VM: Java HotSpot(TM) Server VM (14.3-b01 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4bc080]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid30793.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp

(note that the line '[libjvm.so+0x4bc080]' is consistent for 1.6.0_17 at every SIGSEGV)

or for 1.6.0_18:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xb77468f0, pid=722, tid=2514516880
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) Server VM (16.0-b13 mixed mode linux-x86 )
# Problematic frame:
# V  [libjvm.so+0x4d88f0]
#
# An error report file with more information is saved as:
# /home/wizard/hs_err_pid722.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#
Aborted

(note that the line "[libjvm.so+0x4d88f0]" is consistent for 1.6.0_18 at every SIGSEGV)

The problem is that the log file contains proprietary information that cannot be shared.

Reproducing a "tiny test case" that reproduce the issue ain't realistic either: it's similar to the issue linked above, this only happens when a "deluge of data" is feeded to the app.

Note that the exact same application, on exactly the same hardware, with exactly the same JVM but another version of Linux (I had Debian Etch previously) did NOT trigger that SIGSEGV once.

But this doesn't mean the JVM isn't at fault: it could still be a JVM issue.

Should I report this and how? (keeping in mind that writing a "reproducible tiny test case" is delusional and that the log contains proprietary information that shouldn't be leaked). Should I just edit the log and send it?

What's the procedure to report such reproducible SIGSEGV when your log contains proprietary information and when a test case reproducing the issue ain't realistically doable?

Did any of you have success opening such a bug and then see it solved in a subsequent Java release?

Do you think it's good "for the Java community" to report such an issue or I just shouldn't bother because it's not important?

Surname answered 19/2, 2010 at 20:12 Comment(9)
Does this still apply with the latest version of Java? Also consider using IBM Java or JRocket.Footwall
@Thorbjørn Ravn Andersen: I'll check later tonight and report hereSurname
@Thorbjørn Ravn Andersen: Just downloaded JRE version: 6.0_25-b06. Exact same crash :-/Surname
Also, does this happen on one of the officially supported Linux platforms?Footwall
@Thorbjørn Ravn Andersen: if I find the time I'll try with IBM's JRE and on other distros...Surname
@SyntaxT3rr0r, I strongly suggest that you ensure that you are only using Oracle JVM's on supported Linux distributions. If not, your eventual bug report will not be considered.Footwall
@Thorbjørn Ravn Andersen: I decided not to report the bug seen how complicated the procedure was. Also I'm using a Debian GNU/Linux for both technical and political reasons. If Java can't cope with it, it's Java that's going away for us, not Debian GNU/Linux. Sadly I realize people aren't really interested in helping fixing this. It definitely is an issue that Java ain't working as it should on otherwise rock-stable solid systems. Too bad for Java. Linux (officially supported or not) will continue to be fine I'm sure ; )Surname
@SyntaxT3rr0r, in that case why not just use the client JVM?Footwall
Have you tried later patch versions of java?Benevolence
C
6

I got similar problem upgrading to JDK 1.6_18 and it seems solved using the following options:

-server
-Xms256m
-Xmx748m
-XX:MaxPermSize=128m

-verbose:gc
-XX:+PrintGCTimeStamps
-Xloggc:/tmp/gc.log
-XX:+PrintHeapAtGC
-XX:+PrintGCDetails
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

-XX:+UseParallelGC
-XX:-UseGCOverheadLimit

# Following options just to remote monitoring with jconsole, useful to see JVM behaviour at runtime
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=12345
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=MyHost

I still didn't double check (it is a production environment), but I think the error was due to two reasons:

1) Wrong setting about heap and/or Permanent space (I think JDK 1.6 needs more space in heap and permanent than previous JVM versions) caused an OutOfMemoryError, but

2) in the wrong original setting somebody wrote

-XX:+HeapDumpOnOutOfMemoryError="/tmp"

and not

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="/tmp"

so probably JVM was not able to write the heapdump and we got SIGSEGV only (previous versions wrote heap dump in the working directory).

Check -server -XX:+UseParallelGC -XX:-UseGCOverheadLimit options too. I think playing with VM parameters is not a workaround, but the right approach also because garbage collector (and not only) changed between 1.5 and 1.6.

Curb answered 25/2, 2010 at 8:41 Comment(2)
@glenti: +1, cool, your first answer on SO was to one of my question :) Tried everything you suggested but it's still crashing. There's no sign of an OutOfMemoryError, I tried with a custom JLabel displaying the memory usage. Apparently no PermGen issue neither.Surname
@glenti: your post got me thinking... I'm using a Linux machine with more than one proc and more than 2GB of mem, so Java is defaulting to the -server mode. Interestingly enough if I force "-client" there's no crash anymore... (I'm still not too sure what to do with my reproducible SIGSEGV but it's interesting nonetheless)Surname
N
5

The problem is that the log file contains proprietary information that cannot be shared. Reproducing a "tiny test case" that reproduce the issue ain't realistic either

If you can't provide Sun with a reproducible test case, they won't even look at it. Chance are good that they will ignore it even if you do provide a usable test case. The bug submission process at Sun leaves a lot to be desired.

Should I report this and how?

If you can't come up with a reproducible test case, don't bother. If they can't reproduce the issue, what do you expect them to do?

Note that the exact same application, on exactly the same hardware, with exactly the same JVM but another version of Linux (I had Debian Etch previously) did NOT trigger that SIGSEGV once.

Does it work on a different box with the same hardware and same version of Linux?

Nickles answered 19/2, 2010 at 20:20 Comment(3)
I'm sure that buying support gets you a LOT more attention. How much, depends on the level you buy.Footwall
@Kevin: ah damn... I could dd my hd to another one and hence try with the exact same Linux kernel/configuration and JVMs to see if the SIGSEGV is also reproducible but what you're writing there is quite depressing. A test case would mean hundreds of Megabytes of data to send. Oh well, if it's reproducible on any hardware maybe I should just ship the harddisk or make a Bootable-CD that can reproduce the problem :) (I'm half-serious) What about the OpenJDK? Would things be different if I could reliably reproduce this under the OpenJDK 7 ?Surname
@WizardOfOdds : you say there's propriertary information in the log file. Could you write a parser or something to "banalize" this data, and then send your logfile to Sun ?Siliculose
T
1

If it helps, the bug submission link in your crash report has this disclaimer:

In addition, Sun Microsystems respects your desire for privacy. Personal data collected from this program will not be sold, given or shared with organizations external to Sun. We will use this data for communications with you to clarify issues regarding the report you submitted and/or status of that report. The issues that you report may be made available to other JDC Members or Sun customers, however your personal data will be kept confidential. If you are not comfortable with the above conditions, please do not press the Submit button. If you have any questions, please refer to our Privacy Policy.

Personally, I would report it if it was feasible to hand over the code segment in question with logs, if the data is not too sensitive (perhaps data can be masked or obfuscated in logs?).

It's impossible for you to really judge if the bug is "important" or not for others unless you can know what actually causes it. Reporting it might be the first step in Sun's engineers finding out the cause of something serious.

Tamarin answered 20/2, 2010 at 3:21 Comment(1)
@matt b: yup, was thinking about clearing the filenames in the hs_err_...log. I'll see if a Proguarded version also triggers the crash and then I may even sent the obfuscated .jar + data allowing to reproduce the issue. Still scratching my head on this.Surname
E
0

The very first question you should ask yourself is:

  • Am I using an officially supported Linux distribution?

If not, switch to one that is.

If you are, then report it to Sun!

Ectoplasm answered 19/2, 2010 at 23:39 Comment(3)
@Throbjorn: officially supported by who? By Sun you mean? I'm using the most stable Linux distribution ever made, that a lot of people hate because it's always slow to include the latest flulls & whistles & bells and that other people like me love because it's rock stable solid: Debian :)Surname
Supported by the entity that has produced the JVM you are using. Sun does not say that their Java will run on any Linux distribution in existence, but they say that they "support" the distributions listed on java.sun.com/javase/6/webnotes/install/… (where "support" means even consider listening to bugreports). Debian is not there, but Ubuntu is. Use that instead.Footwall
@Throbjorn: Oh ok I see what you mean (thanks for the link too)... That said Ubuntu is actually Debian based :) Debian is the most highly respected distribution by sysadmins and powers a lot of the Real-World [TM] servers, I'm not switching to any other Linux distro ;) That said the issue is not the SIGSEGV (for I've got workarounds) but what to do with it... :)Surname

© 2022 - 2024 — McMap. All rights reserved.