Why does java app crash in gdb but runs normally in real life?
Asked Answered
R

2

21

Attempting to run java app from gdb results in segfault, yet running app alone does not. This app is a .JAR which uses JOGL and a bit of memory-mapping to talk to the GPU.

Stacktrace below hints at some sort of memory access problem but I don't understand why it manifests in GDB but not in real life. Could there be some environment factor gdb needs to know to allow proper execution?

This issue persists between JVMs OpenJDK 6 and 7, as well as Oracle JRE 7. The oracle JRE runs a little farther into startup before segfault. All segfaults are otherwise consistent in occurrence and location between trials.

Segfault persists between GPUs and drivers(!!): nvidia, radeon, fglrx current and fglrx beta (14.xx). GDB will successfully attach to an already-running instance of my program, however it doesn't seem possible for gDEBugger to do this, which is ultimately what needs to work.

There is no intent to actually debug with gdb. Rather I am trying to use gDEBugger to perform OpenGL debugging. gDEBugger apparently relies on GDB as part of its backend, so if GDB fails, so does gDEBugger. This resulted in attempts to run gdb alone to isolate the issue.

gDEBugger output:
GDB String:  [Thread debugging using libthread_db enabled]  
GDB String:  Using host libthread_db library  /lib/x86_64-linux-gnu/libthread_db.so.1 .  
Thread Created: 140737353893632 (LWP: 3265)
Thread Created: 140737294624512 (LWP: 3266)
Thread Created: 140737293571840 (LWP: 3267)
Thread Created: 140737292519168 (LWP: 3268)
Thread Created: 140737155180288 (LWP: 3269)
Thread Created: 140737154127616 (LWP: 3270)
Thread Created: 140736913602304 (LWP: 3271)
Thread Created: 140736909629184 (LWP: 3272)
Thread Created: 140736908576512 (LWP: 3273)
Thread Created: 140736907523840 (LWP: 3274)
Thread Created: 140736906471168 (LWP: 3275)
Thread Created: 140736905418496 (LWP: 3276)
Thread Created: 140736278275840 (LWP: 3277)
Thread Created: 140736272963328 (LWP: 3278)
Thread Created: 140736271910656 (LWP: 3279)
Thread Created: 140736270857984 (LWP: 3280)
Thread Created: 140736269805312 (LWP: 3281)
Thread Created: 140737287657216 (LWP: 3285)
Thread Created: 140736261945088 (LWP: 3289)
GDB String:  [Thread 0x7fffb6e67700 (LWP 3289) exited]  
Thread Created: 140736261945088 (LWP: 3290)
API Connection Established: gDEBugger Servers Manager
Thread Created: 140736234641152 (LWP: 3291)
GDB String:  [Thread 0x7fffb6e67700 (LWP 3290) exited]  
API Connection Established: gDEBugger OpenGL Server
GDB String:  [Thread 0x7fffb77e8700 (LWP 3279) exited]  
GDB String:  [Thread 0x7fffb76e7700 (LWP 3280) exited]  
Debug String: gDEBugger OpenGL Server was initialized
Thread Created: 140736270857984 (LWP: 3292)
Thread Created: 140735692441344 (LWP: 3294)
Thread Created: 140735582430976 (LWP: 3295)
Thread Created: 140735574038272 (LWP: 3296)
OpenGL Render Context 1 Created
Signal: SIGSEGV
Process Exit


$ java -versionjava version "1.6.0_33"
OpenJDK Runtime Environment (IcedTea6 1.13.5) (6b33-1.13.5-1ubuntu0.14.04)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)

$ gdb -version
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"

$ fglrxinfo
display: :0.0  screen: 0
OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: AMD Radeon HD 5570     
OpenGL version string: 4.4.12967 Compatibility Profile Context 14.20


$ gdb --args java -jar RunMe.jar
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from java...Reading symbols from /usr/lib/debug//usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java...done.
done.
(gdb) show configuration
This GDB was configured as follows:
   configure --host=x86_64-linux-gnu --target=x86_64-linux-gnu
             --with-auto-load-dir=$debugdir:$datadir/auto-load
             --with-auto-load-safe-path=$debugdir:$datadir/auto-load
             --with-expat
             --with-gdb-datadir=/usr/share/gdb (relocatable)
             --with-jit-reader-dir=/usr/lib/gdb (relocatable)
             --without-libunwind-ia64
             --with-lzma
             --with-python=/usr (relocatable)
             --with-separate-debug-dir=/usr/lib/debug (relocatable)
             --with-system-gdbinit=/etc/gdb/gdbinit
             --with-zlib
             --without-babeltrace
(gdb) run
Starting program: /usr/bin/java -jar RunMe.jar
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
process 6866 is executing new program: /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7fc4700 (LWP 6870)]
[New Thread 0x7ffff486c700 (LWP 6871)]
[New Thread 0x7ffff476b700 (LWP 6872)]
[New Thread 0x7ffff466a700 (LWP 6873)]
[New Thread 0x7fffea2d6700 (LWP 6874)]
[New Thread 0x7fffea1d5700 (LWP 6875)]
[New Thread 0x7fffea0d4700 (LWP 6876)]
[New Thread 0x7fffe9d0a700 (LWP 6877)]
[New Thread 0x7fffe9c09700 (LWP 6878)]
[New Thread 0x7fffe9b08700 (LWP 6879)]
[New Thread 0x7fffe9a07700 (LWP 6880)]
[New Thread 0x7fffe9906700 (LWP 6881)]
...
[New Thread 0x7fffe8110700 (LWP 6882)]
[New Thread 0x7fffe3169700 (LWP 6883)]
[New Thread 0x7fffe3068700 (LWP 6884)]
[New Thread 0x7fffe2f67700 (LWP 6885)]
[New Thread 0x7fffe2e66700 (LWP 6886)]
[New Thread 0x7fffe2d65700 (LWP 6887)]
[Thread 0x7fffe2d65700 (LWP 6887) exited]
[New Thread 0x7fffe2d65700 (LWP 6891)]
[Thread 0x7fffe2d65700 (LWP 6891) exited]
[New Thread 0x7fffe2d65700 (LWP 6895)]
[Thread 0x7fffe2d65700 (LWP 6895) exited]
[New Thread 0x7fffe2d65700 (LWP 6896)]
[New Thread 0x7fffe0efd700 (LWP 6897)]
libEGL warning: DRI2: failed to authenticate
[New Thread 0x7fff9799f700 (LWP 6898)]
[New Thread 0x7fff9719e700 (LWP 6899)]
[New Thread 0x7fff9699d700 (LWP 6900)]
[Thread 0x7fffe2d65700 (LWP 6896) exited]
[New Thread 0x7fffe2d65700 (LWP 6901)]
[New Thread 0x7fffe01ab700 (LWP 6902)]
[New Thread 0x7fff92f00700 (LWP 6903)]
[New Thread 0x7fff92dff700 (LWP 6904)]
[New Thread 0x7fff92cfe700 (LWP 6905)]
Setting up sound system...[New Thread 0x7fff92bfd700 (LWP 6906)]

[New Thread 0x7fff92afc700 (LWP 6907)]
[New Thread 0x7fff929fb700 (LWP 6908)]
[New Thread 0x7fff928fa700 (LWP 6909)]
[New Thread 0x7fff927f9700 (LWP 6910)]
[New Thread 0x7fff926f8700 (LWP 6911)]
[New Thread 0x7fff925f7700 (LWP 6912)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe2f67700 (LWP 6885)]
0x00007ffff6b3a770 in acl_CopyRight ()
   from /usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server/libjvm.so
(gdb) where
#0  0x00007ffff6b3a770 in acl_CopyRight ()
   from /usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server/libjvm.so
#1  0x00007ffff6d51309 in Unsafe_CopyMemory2 (env=<optimized out>, 
    unsafe=<optimized out>, srcObj=0x0, srcOffset=140737008618496, dstObj=0x0, 
    dstOffset=140737006779392, size=1024)
    at /build/buildd/openjdk-6-6b33-1.13.5/build/openjdk/hotspot/src/share/vm/prims/unsafe.cpp:689
#2  0x00007fffed011790 in ?? ()
#3  0x0000000000000400 in ?? ()
#4  0x0000000000000000 in ?? ()
Warning: the current language does not match this frame.
(gdb) quit
A debugging session is active.

    Inferior 1 [process 6866] will be killed.

Quit anyway? (y or n) y

UPDATE: Switched to AMD CodeXL (basically the most recent form of gDEBugger) and situation hasn't changed much.

Resonate answered 2/12, 2014 at 3:53 Comment(0)
T
49

Why does java app crash in gdb but runs normally in real life?

Because it doesn't actually crash.

Java uses speculative loads. If a pointer points to addressable memory, the load succeeds. Rarely the pointer does not point to addressable memory, and the attempted load generates SIGSEGV ... which java runtime intercepts, makes the memory addressable again, and restarts the load instruction.

When debugging java programs, one has to generally do this:

(gdb) handle SIGSEGV nostop noprint pass

Unfortunately, if there is some JNI code involved, and that code SIGSEGVs, GDB will happily ignore that signal as well, resulting in the death of inferior (being debugged) process. I have not found an acceptable solution for that latter problem.

Tidy answered 2/12, 2014 at 4:48 Comment(5)
After trying your handle command I confirm that it fixes the issue in GDB.Resonate
I faced the same problem when trying to set break point at some hotspot function and it works perfectly. But setting catchpoint e.g. catch syscall futex and printing its backtrace still crashes with frame.c:534: internal-error: frame_id get_frame_id(frame_info*): Assertion 'fi->level == 0' failed.. Do you have any idea how to workaround this?Pendent
I think this is a separate question... #54365579Pendent
specially annoying with Ada native code, that installs a specific sigsegv handler. Each subsequent SIGSEGV caused by the JVM crashes the app...Jenicejeniece
Thanks for pointing out the problem cause. However in my case I know my program will have real segmentation fault later in a native .so file ... How to debug this?Berkowitz
V
3

This was too long for a single comment on the accepted answer. Its basically links quoting for future reference (in case the pages vanish).

Some of you might find interest in part 2.

Table of contents

  1. Small trick
  2. Reasons / documentation
  3. Signal Chaining between native code and JVM

0. Small trick

A way around the issue could be to force the JVM to invoke a GDB console on error using the following JVM launch directive (see this blog page from Alexey Pirogov which can be also found in Oracle Java doc along with several usage example):

-XX:OnError="gdb - %p"

p will be replaced with the PID.

Example output from the blog post below. From what I read, it looks like the JVM is able to tell if a given SIGSEGV is Java-induced (and use it silently) or if it comes from a (C++) lib. As far as I understand, this means the GDB session would start on a "legit" SIGSEGV occurrence, with a correct context.

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f7348cba806, pid=10055, tid=10057
#
# JRE version: OpenJDK Runtime Environment (10.0.2+13) (build 10.0.2+13->    Ubuntu-1ubuntu0.18.04.4)
# Java VM: OpenJDK 64-Bit Server VM (10.0.2+13-Ubuntu-1ubuntu0.18.04.4, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libJNIDemo.so+0x806]  Java_jnidemo_JNIDemoJava_nativeCrash+0x1c
#
...
(gdb)

I found statements in this SO answer inconsistent with the Oracle Java doc description, but I would rather trust Oracle doc.

1. Reasons / documentation

I found this link https://www.ateam-oracle.com/why-am-i-seeing-sigsegv-when-i-strace-a-java-application-on-linux

It gives some insight for JVM behind-the-scene implementation.

The JVM is a multi-threaded process and so under the covers it's using signals to do OS level threading.

But the JVM is also doing a metric ton of other really clever stuff; for example in a regular C/C++ program [emphazis mine] hitting a NULL (a Zero) when you're expecting a pointer to some structure would cause your application to crash. That crash is actually, as you can probably guess by now, the OS sending your process a signal - specifically SIGSEGV. If your app didn't register a signal handler for that signal (and 99.5% of c/c++ apps out there don't) then the signal comes back up to the OS which then terminates the app and (usually) saves the memory state into a core file.

The JVM does register a signal handler for SIGSEGV and not just because it doesn't want to crash out when something goes wrong. The JVM registers a signal handler for SIGSEGV because it actually uses SIGSEGV and a bunch of other signals for its own purposes. [emphazis mine]

[...] And that's perfectly normal and completely safe.

The above link also points to this https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/signals.html

Signal

  • SIGSEGV, SIGBUS, SIGFPE, SIGPIPE, SIGILL

    Used in the implementation for implicit null check, and so forth.

  • SIGQUIT

    Thread dump support: To dump Java stack traces at the standard error stream. (Optional.)

  • SIGTERM, SIGINT, SIGHUP

    Used to support the shutdown hook mechanism (java.lang.Runtime.addShutdownHook) when the VM is terminated abnormally. (Optional.)

  • SIGUSR1

    Used in the implementation of the java.lang.Thread.interrupt method. (Configurable.) Not used starting with Solaris 10 OS. Reserved on Linux.

  • SIGUSR2

    Used internally. (Configurable.) Not used starting with Solaris 10 OS.

  • SIGABRT

    The HotSpot VM does not handle this signal. Instead it calls the abort function after fatal error handling. If an application uses this signal then it should terminate the process to preserve the expected semantics.

2. Quotes related to Signal Chaining

The Oracle link indicates that some actions can be taken to better handle the signals between JVM and non-java code. This is referred as signals chaining.

NOTE: I do not know if it works, and if it has any positive effect when using debugging a library called by a Java app.

I think it won't help at intercepting the "right" signal during a GDB session. But maybe with custom handler code + breakpoint it could?

From my understanding, it seems suited for a native application that would embbed a JVM, not a JVM app that embbeds a native library. I keep quotes there for completness

Quoting:

If an application with native code requires its own signal handlers, then it might need to be used with the signal chaining facility.

An application can link and load the libjsig.so shared library before libc/libthread/libpthread. This library ensures that calls such as signal(), sigset(), and sigaction() are intercepted so that they do not actually replace the Java HotSpot VM's signal handlers if the handlers conflict with those already installed by the Java HotSpot VM. Instead, these calls save the new signal handlers, or chain them behind the VM-installed handlers. During execution, when any of these signals are raised and found not to be targeted at the Java HotSpot VM, the pre-installed handlers are invoked.

The proposed procedure:

Perform one of these two procedures to use the libjsig.so shared library.

  1. Link it with the application that creates/embeds a HotSpot VM [remark: so this is not relevant for a library loaded from a Java app ...] , for example:

    cc -L libjvm.so-directory -ljsig -ljvm java_application.c
    
  2. Use the LD_PRELOAD environment variable, for example [see https://mcmap.net/q/17999/-what-is-the-ld_preload-trick]:

    export LD_PRELOAD=libjvm.so-directory/libjsig.so; java_application (ksh)
    
    setenv LD_PRELOAD libjvm.so-directory/libjsig.so; java_application (csh)
    

The interposed signal(), sigset(), and sigaction() return the saved signal handlers, not the signal handlers installed by the Java HotSpot VM and which are seen by the operating system.

Note that SIGUSR1 cannot be chained.

1

Victim answered 1/7, 2020 at 10:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.