No backtrace from SIGABRT signal on ARM platform?
Asked Answered
E

2

19

I'm using 'backtrce()' and 'backtrace_symbols_fd()' functions in a signal handler to generate a backtrace for debugging (GDB not available).

They work fine on x86 desktop (Ubuntu), but on the target device (ARM based) the backtrace on Abort signal (due to double-free error) shows only three frames: the signal handler and two from within libc, which is not useful for debugging our code! Backtrace on SEGV (e.g. using a bad pointer) DOES produce a good backtrace.

Why can't I get a useful backtrace on ABRT signal on ARM?

[Question edited for clarity]

Here's a simple test program which demonstrates the problem:

#include <execinfo.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

// Signal hangler to catch seg fault:
void handler_segv(int sig) {
    // get void*'s for all entries on the stack
    void *array[10];
    size_t size;
    size = backtrace(array, 10);
    fprintf(stderr, "Error: Signal %d; %d frames found:\n", sig, size);
    // print out all the frames to stderr
    backtrace_symbols_fd(array, size, STDERR_FILENO);
    exit(1);
}


void crashme()
{
  // Deliberate Error: Abort (double free):
  char *test_ptr = malloc(1);
  free(test_ptr);
  free(test_ptr);
  // Deliberate Error #2: Seg fault:
  //char * p = NULL;
  //*p = 0;
}

void foo()
{
    fprintf(stdout, "---->About to crash...\n");
    crashme();
    fprintf(stdout, "---->Crashed (shouldn't get to here)...\n");
}



// Main entry point:
int main(int argc, char *argv[])
{
    fprintf(stdout, "Application start...\n");

    // Install signal handlers:
    fprintf(stdout, "-->Adding handler for SIGSEGV and SIGABRT\n");
    signal(SIGSEGV, handler_segv);
    signal(SIGABRT, handler_segv);

    fprintf(stdout, "-->OK. Causing Error...\n");
    foo();
    fprintf(stdout, "-->Test finished (shouldn't get to here!)\n");
    return 0;
}

This was compiled for x86 as follows:

gcc -o test test-backtrace-simple.c -g -rdynamic

And for ARM:

arm-none-linux-gnueabi-gcc -o test-arm test-backtrace-simple.c -g -rdynamic -O0 -mapcs-frame -funwind-tables -fasynchronous-unwind-tables

I've used various compiler options for ARM as described in other posts related to generating backtraces on ARM.

When run on the x86 desktop, it generates the expected output with plenty of debug, ending in:

Error: Signal 6; 10 frames found: 
./test(handler_segv+0x19)[0x80487dd]
[0xb7745404] 
[0xb7745428]
/lib/i386-linux-gnu/libc.so.6(gsignal+0x4f)[0xb75b0e0f]
/lib/i386-linux-gnu/libc.so.6(abort+0x175)[0xb75b4455]
/lib/i386-linux-gnu/libc.so.6(+0x6a43a)[0xb75ed43a]
/lib/i386-linux-gnu/libc.so.6(+0x74f82)[0xb75f7f82]
./test(crashme+0x2b)[0x8048855] 
./test(foo+0x33)[0x804888a]
./test(main+0xae)[0x8048962]

(i.e. the back trace generated by my handler, with my function calls at the bottom).

However, when run on the ARM platform, I get:

Application start...
-->Adding handler for SIGSEGV and SIGABRT
-->OK. Causing Error...
---->About to crash...
*** Error in `/opt/bin/test-arm': double free or corruption (fasttop): 0x015b6008 ***
Error: Signal 6; 3 frames found:
/opt/bin/test-arm(handler_segv+0x24)[0x8868]
/lib/libc.so.6(__default_sa_restorer_v2+0x0)[0xb6e6c150]
/lib/libc.so.6(gsignal+0x34)[0xb6e6af48]

The backtrace() finds only 3 frames, and they are only the signal handler and something in libc (not useful)!

I found a mailing list post which said:

If you link with the debugging C library, -lc_g, you'll get debugging info back past abort().

This might be relevant, but -lc_g doesn't work on my compiler (ld: cannot find -lg_c).

The backtrace works fine on ARM if I generate a seg fault instead (e.g. change crashme() function to use "char *p = NULL; *p = 0;" instead of the double free.

Any ideas or suggestions for other ways to get a back trace?

[--EDIT--]

I tried some MALLOC_CHECK_ options as suggested in the comments, but the only effect was to change whether the abort was generated. Here is the output from three runs on the ARM:

 # MALLOC_CHECK_=0 /opt/bin/test-arm
Application start...
-->Adding handler for SIGSEGV and SIGABRT
-->OK. Causing Error...
---->About to crash...
---->Crashed (shouldn't get to here)...
-->Test finished (shouldn't get to here!)


# MALLOC_CHECK_=1 /opt/bin/test-arm
Application start...
-->Adding handler for SIGSEGV and SIGABRT
-->OK. Causing Error...
---->About to crash...
*** Error in `/opt/bin/test-arm': free(): invalid pointer: 0x015b2008 ***
---->Crashed (shouldn't get to here)...
-->Test finished (shouldn't get to here!)


# MALLOC_CHECK_=2 /opt/bin/test-arm
Application start...
-->Adding handler for SIGSEGV and SIGABRT
-->OK. Causing Error...
---->About to crash...
Error: Signal 6; 3 frames found:
/opt/bin/test-arm(handler_segv+0x24)[0x8868]
/lib/libc.so.6(__default_sa_restorer_v2+0x0)[0xb6e24150]
/lib/libc.so.6(gsignal+0x34)[0xb6e22f48]
#

MALLOC_CHECK_=0: No error message (double free is ignored!)

MALLOC_CHECK_=1: Error message, but program continues

MALLOC_CHECK_=2: Error message and ABRT signal; useless backtrace generated (this is the default behaviour!)

My cross compiler reports: gcc version 4.6.1 (Sourcery CodeBench Lite 2011.09-70) Target device has linux kernel version 3.8.8

Euripus answered 21/7, 2015 at 1:17 Comment(10)
Have you taken a look at: gnu.org/software/libc/manual/html_node/Backtraces.html It gives an example of how backtracing can be used without the need for gdb. Let me know if this helps, thanks.Lobate
@Kozmik yes, already using pretty much that (see question and attached example code). However it doesn't work correctly for an ABRT caused by double free.Euripus
Could you state what your asking for as briefly as possible? I'm a bit confused on your question about what it is you are really asking for help on.Lobate
"How (on ARM platform) do I get a useful back trace for an abort signal caused by a double free?". Using the 'backtrace()' function I only get three frames, one from the signal handler and two from libc, which are not useful since I am trying to find out where (in my code) the double free is occurring. Note the back trace DOES work properly when the code is run on my Ubuntu desktop so it seems to be an issue with the ARM compiler.Euripus
You can try using MALLOC_CHECK_ to '1' or '2' and see if this helps. You should see early aborts or error messages which will help you debug. This along with backtrace should help you out. gnu.org/software/libc/manual/html_node/…Protuberance
@Arun: I tried MALLOC_CHECK_ as suggested, but this only changed whether any error message and/or abort signal was generated by the double-free; if the abort occurs, I still get a useless backtrace that shows only libc (not my code) - See edited question above for the output.Euripus
fprintf(stderr, "Error: Signal ... : you know that prinf() and friends are not signal-safe ?Terzas
I can relate that the same thing occurs on my Raspberry Pi. Compiling with the additional flags doesn't change anything.Hobnob
Running on the exact same problem. Did you ever got it working? Did you have a chance to try @itaych 's suggestion?Cultured
I haven't tried @itaych 's solution, but I think it is probably correct, i.e. you also have to build libstdc++ and similar libraries with the appropriate flags set, and they probably weren't set in the toolchain we were using.Euripus
C
15

It appears you have done sufficient research to know that you need the switches -funwind-tables and -fasynchronous-unwind-tables in your compiler command line. In practice either one of them seems sufficient but clearly without them backtracing doesn't work at all. Now, the trouble with things like SIGABRT is that the backtrace must traverse stack frames that were generated by libc functions such as abort and gsignal, and fails because that lib is not built with either of those switches (in any distribution that I know of).

While it would be nice to petition the maintainers of Sourcery CodeBench to build their distribution with that option, the only immediate solution is to build libc yourself, with either or both of those flags set (in my experience just -funwind-tables is enough). If you also need a stack trace in case of catching an unhandled exception (via std::set_terminate) then you will also need to rebuild libstdc++.

At my workplace we needed backtraces for both cases (SIGABRT and unhandled exceptions), and since libstdc++ is part of the toolchain we rebuilt the toolchain ourselves. The tool crosstool-NG makes this relatively easy to do. In the configuration utility ./ct-ng menuconfig we entered section Target Options and edited Target CFLAGS (which sets the build variable TARGET_CFLAGS) to -funwind-tables. The resulting toolchain (more specifically, using the libc and libstdc++ from the resulting toolchain build) provides us with a full backtrace in nearly all cases.

I've found one case where we still don't get a full backtrace: if the crash occurred within a function that originally is written in assembly, such as memcpy (unfortunately this is not an uncommon occurrence). Perhaps some option needs to be passed to the assembler, but I didn't have the time to investigate this further.

Casi answered 16/1, 2018 at 14:6 Comment(7)
Thanks, that's an interesting angle that I had not considered. Unfortunately I'm not in a position to test your solution at the moment.Euripus
So I managed to compile glibc 2.28 with -funwind-tables and -fasynchronous-unwind-tables and test the code above (on ubuntu linaro 16.04 armhf). I had to use the following article to be sure I was linking against my custom-built glibc. #10763894 All of this to no avail: no stack trace with any MALLOC_DEBUG_ level [0 .. 2];. Any thoughts on what I might have missed, or does stack tracing just not work on ARM?Kattegat
@DaveMcMordie - running the code from the question, on an ARM, the stack trace has 7 entries and ends with "/lib/libc.so.6(__libc_start_main+0x114)[0xf70fecfc]" and looks complete, and the same as what I get on x86. Make sure you're linking with your custom glibc at runtime - if they're not placed in the standard location on your target system (/lib/ I guess) use LD_LIBRARY_PATH. Also make sure that your own project is also compiled with the flags -funwind-tables -fasynchronous-unwind-tables -g -rdynamic .Casi
@Casi what version of glibc have you linked against and how was it built? I am certain I am linking against the correct glibc. The default one actually does better-- I get five frames ending at abort. My configure for glibc: ../configure --prefix=/opt/lib CFLAGS='-mapcs-frame -rdynamic -funwind-tables -fasynchronous-unwind-tables -fno-omit-frame-pointer -g -O3' libc_cv_ctors_header=yesKattegat
@DaveMcMordie The Glibc version is 2.26. We built it along with the GCC 5.1 toolchain using ct-ng as detailed in my answer above.Casi
@Casi I confirm your answer and I have to retract my certainty that I was linking correctly. I am now able to get 13 frames. Turns out linking correctly against a custom glibc is rather tricky and will only work correctly on programs with no other dependencies (not our case). I have actually reverted to rebuilding the debian package (ie. apt-get source libc6-dev) with the modified cflags in the debian/rules file. Thanks very much for taking the time to report your findings!Kattegat
Very useful comment. Note that it must be possible in theory without rebuilding the toolchain though. I have the same issue as Jeremy explains (perfectly working backtrace decoding on x86, not on arm) But if I debug my arm application with gdbserver, it does succeed in fully decoding all the task frames. I guess gdb is more clever than the "backtrace_symbols" function that we compile with.Tamah
I
6

This is because unwinding through signal handlers is broken in glibc on ARM. I've dug into this a few years back and managed to create a working standalone fix. The hard part was digging through the undocumented bowels of exception handling in glibc, after that the fix was simple bordering on trivial.

I posted this to the glibc mailing list, as reply to an old thread about this problem, in the hope that a glibc dev would take my standalone fix as guide to fix it in glibc proper, but this never happened.

Recently I tested it again: it turns out that the problem still hasn't been fixed in glibc, and due to changes in glibc my fix no longer works. Update: I've fixed it!

Inebriant answered 24/1, 2020 at 16:35 Comment(7)
I'm glad it wasn't just me going crazy! Thanks for posting your solution; next time I'm working on the ARM platform, I'll need to try it. Hopefully someone finds it useful!Euripus
@Inebriant I tried your library but unfortunately it didn't work. In the SIGABRT case it just prints 3 frames consisting of handler_segv, then sa_restorer_v2.S from your library, then some nonsense address from libc (addr2line maps it to strfmon_l.c which makes no sense). If I trigger the abort via gdb, that does print all frames, so there is some way to get that info...Holloway
Backtrace from SIGSEGV does match that shown by gdb.Holloway
addr2line wants an address relative to the start of the executable, but the location of the executable in ram is randomized for security (ASLR) so feeding actual runtime addresses directly into addr2line is not going to work.Inebriant
It's not really clear what you're doing, and this comment thread is probably not the best place for a detailed discussion. Feel free to open an issue on github with a sufficiently detailed explanation of the problem you're having.Inebriant
@Matthijs, you wrote "Update: I've fixed it!" Does this mean it is now fixed in mainstream? Can you provide a link to the thread?Lennie
@Lennie no I meant I fixed my code to support current glibc.Inebriant

© 2022 - 2024 — McMap. All rights reserved.