gdb weird backtrace

Asked 13/3, 2011 at 15:17 Answered 13/3, 2011 at 17:31

My program is statically compiled with dietlibc. It is compiled on ubuntu x64 (compiled for x86 using the -m32 flag) and is run on a centos x86.

The compiled size is only about 100KB. I compile it with -ggdb3 and no optimization flags.

My program uses signal.h to handle a SIGSEGV signal and then calls abort().

The program runs without problems for days but sometimes segfaults. This is when I get weird backtraces that I do not understand:

username@ubuntu:~/Desktop$ gdb -c core.28569 program-name
GNU gdb (GDB) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-linux-gnu --target=i386-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from program-name...done.
[New Thread 28569]
Core was generated by `program-name'.
Program terminated with signal 6, Aborted.
#0  0x00914410 in __kernel_vsyscall ()
Setting up the environment for debugging gdb.
Function "internal_error" not defined.
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]
Function "info_command" not defined.
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]
.gdbinit:8: Error in sourced command file:
Argument required (one or more breakpoint numbers).
(gdb) bt
#0  0x00914410 in __kernel_vsyscall ()
During symbol reading, incomplete CFI data; unspecified registers (e.g., eax) at 0x914411.
#1  0x0804d7f4 in __unified_syscall ()
#2  0xbf8966c0 in ?? ()
#3  
#4  0x2054454e in ?? ()
#5  0x20524c43 in ?? ()
#6  0x2e352e33 in ?? ()
#7  0x32373033 in ?? ()
#8  0x2e203b39 in ?? ()
#9  0x2054454e in ?? ()
#10 0x20524c43 in ?? ()
#11 0x2e302e33 in ?? ()
#12 0x32373033 in ?? ()
#13 0x4d203b39 in ?? ()
#14 0x61696465 in ?? ()
#15 0x6e654320 in ?? ()
#16 0x20726574 in ?? ()
#17 0x36204350 in ?? ()
#18 0x203b302e in ?? ()
#19 0x54454e2e in ?? ()
#20 0x43302e34 in ?? ()
#21 0x00000029 in ?? ()
#22 0xbf8989a8 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) bt full
#0  0x00914410 in __kernel_vsyscall ()
No symbol table info available.
#1  0x0804d7f4 in __unified_syscall ()
No symbol table info available.
#2  0xbf8966c0 in ?? ()
No symbol table info available.
#3  
No symbol table info available.
#4  0x2054454e in ?? ()
No symbol table info available.
#5  0x20524c43 in ?? ()
No symbol table info available.
#6  0x2e352e33 in ?? ()
No symbol table info available.
#7  0x32373033 in ?? ()
No symbol table info available.
#8  0x2e203b39 in ?? ()
No symbol table info available.
#9  0x2054454e in ?? ()
No symbol table info available.
#10 0x20524c43 in ?? ()
No symbol table info available.
#11 0x2e302e33 in ?? ()
No symbol table info available.
#12 0x32373033 in ?? ()
No symbol table info available.
#13 0x4d203b39 in ?? ()
No symbol table info available.
#14 0x61696465 in ?? ()
No symbol table info available.
#15 0x6e654320 in ?? ()
No symbol table info available.
#16 0x20726574 in ?? ()
No symbol table info available.
#17 0x36204350 in ?? ()
No symbol table info available.
#18 0x203b302e in ?? ()
No symbol table info available.
#19 0x54454e2e in ?? ()
No symbol table info available.
#20 0x43302e34 in ?? ()
No symbol table info available.
#21 0x00000029 in ?? ()
No symbol table info available.
#22 0xbf8989a8 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) quit

Nagano answered 13/3, 2011 at 15:17 Comment(0)

It's a stack overrun.

#4  0x2054454e in ?? ()

That looks like text, " TEN" or "NET "

#5  0x20524c43 in ?? ()

" RLC" or "CLR "

And so on.

Treat the addresses as if they were text - see if you can identify where this text overwrites your stack.

Obstruct answered 13/3, 2011 at 15:26 Comment(3)

Actually, Erik is right. It was a strncat on an uninitialized variable. That is why it sometimes segfaulted and other times it did not. BTW, the text was indeed "NET" and "CLR". Thank you. – Nagano 13/3, 2011 at 18:29

Even in that case, my answer is probably also correct, and you'd better fix dietlibc for the next time you call abort. – Haematosis 13/3, 2011 at 23:13

@EmployedRussian @Obstruct Can you please let me know how those addresses are decoded to text? I see Ascii in hex for NET is 6E6574. Even I'm facing the same issue where my address is 0x77057a98 and 0x77057a24. Thank you. – Spencerianism 19/3, 2020 at 23:21

Your stack trace is actually very easy to understand:

You got SIGSEGV somewhere,
Your signal handler did whatever it does, then called abort()
Which issued raise(2) system call, by calling __unified_syscall()

The reason you get no stack trace in GDB is that

__unified_syscall is implemented in assembly, and
does not use frame pointer, and
does not have proper cfi directives to describe how to unwind from it.

I would consider this a bug in dietlibc, quite easy to fix, actually. See if this (untested) patch fixes it for you:

--- dietlibc-0.31/i386/unified.S.orig   2011-03-13 10:16:23.000000000 -0700
+++ dietlibc-0.31/i386/unified.S    2011-03-13 10:21:32.000000000 -0700
@@ -31,8 +31,14 @@ __unified_syscall:
    movzbl  %al, %eax
 .L1:
    push    %edi
+        cfi_adjust_cfa_offset (4)
+        cfi_rel_offset (edi, 0)
    push    %esi
+        cfi_adjust_cfa_offset (4)
+        cfi_rel_offset (esi, 0)
    push    %ebx
+        cfi_adjust_cfa_offset (4)
+        cfi_rel_offset (ebx, 0)
    movl    %esp,%edi
    /* we use movl instead of pop because otherwise a signal would
       destroy the stack frame and crash the program, although it
@@ -61,8 +67,11 @@ __unified_syscall:
 #endif
 .Lnoerror:
    pop %ebx
+        cfi_adjust_cfa_offset (-4)
    pop %esi
+        cfi_adjust_cfa_offset (-4)
    pop %edi
+        cfi_adjust_cfa_offset (-4)

 /* here we go and "reuse" the return for weak-void functions */
 #include "dietuglyweaks.h"

If you can't rebuild dietlibc, or if the patch is incorrect, you may still be able to analyze the stack trace better. As far as I can tell, __unified_syscall does not touch %ebp. So you might be able to get a reasonable stack trace by doing this:

define xbt
  set $xbp = (void **)$arg0
  while 1
    x/2a $xbp
    set $xbp = (void **)$xbp[0]
  end
end

xbt $ebp

Note: if the xbt works, it is likely to go into the weeds around the SIGSEGV signal frame (that frame does not use frame pointer either). This may result in complete garbage, or in a skipped frame or two (which would be exactly the frames where SIGSEGV happened).

So you really are much better off getting proper unwind descriptors into dietlibc.

Haematosis answered 13/3, 2011 at 17:31 Comment(0)

Recommended topics

Hot tags