How to get a call stack backtrace? (deeply embedded, no library support)
Asked Answered
C

6

20

I want my exception handlers and debug functions to be able to print call stack backtraces, basically just like the backtrace() library function in glibc. Unfortunately, my C library (Newlib) doesn't provide such a call.

I've got something like this:

#include <unwind.h> // GCC's internal unwinder, part of libgcc
_Unwind_Reason_Code trace_fcn(_Unwind_Context *ctx, void *d)
{
    int *depth = (int*)d;
    printf("\t#%d: program counter at %08x\n", *depth, _Unwind_GetIP(ctx));
    (*depth)++;
    return _URC_NO_REASON;
}

void print_backtrace_here()
{
    int depth = 0;
    _Unwind_Backtrace(&trace_fcn, &depth);
}

which basically works but the resulting traces aren't always complete. For example, if I do

int func3() { print_backtrace_here(); return 0; }
int func2() { return func3(); }
int func1() { return func2(); }
int main()  { return func1(); }

the backtrace only shows func3() and main(). (This is obv. a toy example, but I have checked the disassembly and confirmed that these functions are all here in full and not optimized out or inlined.)

Update: I tried this backtrace code on the old ARM7 system but with the same (or at least, as equivalent as possible) compiler options and linker script and it prints a correct, full backtrace (i.e. func1 and func2 aren't missing) and indeed it even backtraces up past main into the boot initialization code. So presumably the problem isn't with the linker script or compiler options. (Also, confirmed from disassembly that no frame pointer is used in this ARM7 test either).

The code is compiled with -fomit-frame-pointer, but my platform (bare metal ARM Cortex M3) defines an ABI that does not use a frame pointer anyway. (A previous version of this system used the old APCS ABI on ARM7 with forced stack frames and frame pointer, and an backtrace like the one here, which worked perfectly).

The whole system is compiled with -fexception, which ensures the necessary metadata that _Unwind uses is included in the ELF file. (_Unwind is designed for exception handling I think).

So, my question is: Is there a "standard", accepted way of getting reliable backtraces in embedded systems using GCC?

I don't mind having to mess around with the linker scripts and crt0 code if necessary, but don't want to have to make any chances to the toolchain itself.

Thanks!

Cadency answered 3/8, 2010 at 16:38 Comment(4)
Many dupes, including #77505Bolte
Neil: Have you read the question? (beside the headline and the bold printed line?) He gets a backtrace but it is missing some called functions.Maleficent
This was helpful in getting backtrace printing in Android NDK projects.Mchail
Did any of the answers solved your issues?Responsibility
C
10

For this you need -funwind-tables or -fasynchronous-unwind-tables In some targets this is required in order for _Unwind_Backtrace work properly!

Canale answered 4/8, 2011 at 19:7 Comment(2)
I have no idea what this option does, but you may also need to specify --no-merge-exidx-entries when linking. old.nabble.com/Stack-backtrace-for-ARM-Thumb-td29264138.htmlCrenelate
@JustinL. - link is currenty dead, FWIW.Quintile
H
9

Since ARM platforms do not use a frame pointer, you never quite know how big the stackframe is and cannot simply roll out the stack beyond the single return value in R14.

When investigating a crash for which we do not have debug symbols, we simply dump the whole stack and lookup the closest symbol to each item in the instruction range. It does generate a load of false positives but can still be very useful for investigating crashes.

If you are running pure ELF executables, you can separate debug symbols out of your release executable. gdb can then help you find out what is going on from your standard unix core dump

Hippogriff answered 3/8, 2010 at 16:57 Comment(7)
You could reduce the false positives by using the disassembled executable to manually reconstruct the stack frames; look at the first few instructions of each function to count the stacked registers, and any further adjustments to the stack pointer.Dickens
Nitpick: some ARM platforms do use a frame pointer (usually r11). But that's not important here, since the questioner states that his platform doesn't.Dickens
Mike: yes I could do that (myself)... but surely there is some code or library I can leverage that already does it?! Surely in the context of exceptions, every possible stack frame has to contain the necessary metadata (at a minimum, the size) to unwind up the stack. Thus, given exception handling works, why can't gcc's own unwinder do this for me?Cadency
@hugov: exception handling needs to know which objects to destroy, where to jump to, and what state to restore the stack to. It doesn't need to know the complete call stack, so I wouldn't expect to be able to reconstruct a complete stack trace unless the compiler specifically chooses to support this. From your experience, I'm guessing it doesn't, but I could be wrong.Dickens
@Mike Seymour - Technically ARM assembler does not even have the concept of a stack built into it. The closest we come is the LDM and STM instructions. So you are free to implement a stack any way you like. The ARM Procedure Call which is used for most standard ARM ABIs does not support a frame pointer but there is nothing other than compatibility that will stop you from using a frame pointer.Hippogriff
@deus: Indeed, although Thumb has push and pop instructions which assume a full-descending stack with r13 as the stack pointer, so the concept of a stack has slipped into assembly there. The current ABI doesn't have a concept of a frame pointer, but older ones had variants that did, to allow unwinding in the days when debugging information couldn't be relied on for that.Dickens
Your suggestion of searching the stack looking for instruction pointers is basically what github.com/armink/CmBacktrace does, at least false positives are better than false negatives.Definiens
M
7

gcc does return optimization. In func1() and func2() it does not call func2()/func3() - instead of this, it jumps to func2()/func3(), so func3() can return immediately to main().

In your case, func1() and func2() do not need to setup a stack frame, but if they would do (e.g. for local variables), gcc still can do the optimization if the function call is the last instruction - it then cleans up the stack before the jump to func3().

Have a look at the generated assembler code to see it.


Edit/Update:

To verify that this is the reason, do something after the function call, that cannot be reordered by the compiler (e.g. using a return value). Or just try compiling with -O0.

Maleficent answered 3/8, 2010 at 16:40 Comment(6)
He says the functions are there (not inlined), but he did not say if he has checked if the functions are called or jumped to.Maleficent
@DeadMG: The downvote is certainly harsh. Tail calls are usually optimised like this when compiling for ARM, and this optimisation would give exactly the observed results.Dickens
The OP specifically said he checked the disassembler.Scleroderma
@DeadMG: He said that he checked that the functions were called rather than inlined, but he may have missed the functions ending with a branch rather than a return. It's not something you'd notice unless you carefully read every instruction. Of course, your votes are yours to deal out as you see fit.Dickens
@DeadMG: Even with a a look at the disassembly, if you don't know about this optimization you easily can oversee if there is a call or jump. I still think this is the problem here - the other answer is interesting, but it does not explain why there is only func3() and main() in the backtrace. (and not func3() and func2() only).Maleficent
To clarify: the simplified toy code in the original post could have done return call/jump optimization, but in the actual code, there are things on both sides of the call that could not (and I have verified that they are not) being optimized away. There is a push/pop at the start and end of each function, and the next function in the chain is called with a blx instruction (Thumb2).Cadency
B
3

Some compilers, like GCC optimize function calls like you mentioned in the example. For the operation of the code fragment, it is not needed to store the intermediate return pointers in the call chain. It's perfectly OK to return from func3() to main(), as the intermediate functions don't do anything extra besides calling another function.

It's not the same as code elimination (actually the intermediate functions could be completely optimized out), and a separate compiler parameter might control this kind of optimisation.

If you use GCC, try -fno-optimize-sibling-calls

Another handy GCC option is -mno-sched-prolog, which prevents instruction reordering in the function prologue, which is vital, if you want to parse the code byte-by-byte, like it is done here: http://www.kegel.com/stackcheck/checkstack-pl.txt

Bellbottoms answered 25/2, 2014 at 14:25 Comment(0)
C
1

This is hacky, but I've found it works good enough considering the amount of code/RAM space required:

Assuming you're using ARM THUMB mode, compile with the following options:

-mtpcs-frame -mtpcs-leaf-frame  -fno-omit-frame-pointer

The following function is used to retrieve the callstack. Refer to the comments for more info:

/*
 * This should be compiled with:
 *  -mtpcs-frame -mtpcs-leaf-frame  -fno-omit-frame-pointer
 *
 *  With these options, the Stack pointer is automatically pushed to the stack
 *  at the beginning of each function.
 *
 *  This function basically iterates through the current stack finding the following combination of values:
 *  - <Frame Address>
 *  - <Link Address>
 *
 *  This combination will occur for each function in the call stack
 */
static void backtrace(uint32_t *caller_list, const uint32_t *caller_list_end, const uint32_t *stack_pointer)
{
    uint32_t previous_frame_address = (uint32_t)stack_pointer;
    uint32_t stack_entry_counter = 0;

    // be sure to clear the caller_list buffer
    memset(caller_list, 0, caller_list_end-caller_list);

    // loop until the buffer is full
    while(caller_list < caller_list_end)
    {
        // Attempt to obtain next stack pointer
        // The link address should come immediately after
        const uint32_t possible_frame_address = *stack_pointer;
        const uint32_t possible_link_address = *(stack_pointer+1);

        // Have we searched past the allowable size of a given stack?
        if(stack_entry_counter > PLATFORM_MAX_STACK_SIZE/4)
        {
            // yes, so just quite
            break;
        }
        // Next check that the frame addresss (i.e. stack pointer for the function)
        // and Link address are within an acceptable range
        else if((possible_frame_address > previous_frame_address) &&
                ((possible_frame_address < previous_frame_address + PLATFORM_MAX_STACK_SIZE)) &&
               ((possible_link_address  & 0x01) != 0) && // in THUMB mode the address will be odd
                (possible_link_address > PLATFORM_CODE_SPACE_START_ADDRESS &&
                 possible_link_address < PLATFORM_CODE_SPACE_END_ADDRESS))
        {
            // We found two acceptable values

            // Store the link address
            *caller_list++ = possible_link_address;

            // Update the book-keeping registers for the next search
            previous_frame_address = possible_frame_address;
            stack_pointer = (uint32_t*)(possible_frame_address + 4);
            stack_entry_counter = 0;
        }
        else
        {
            // Keep iterating through the stack until be find an acceptable combination
            ++stack_pointer;
            ++stack_entry_counter;
        }
    }

}

You'll need to update #defines for your platform.

Then call the following to populate a buffer with the current call stack:

uint32_t callers[8];
uint32_t sp_reg;
__ASM volatile ("mov %0, sp" : "=r" (sp_reg) );
backtrace(callers, &callers[8], (uint32_t*)sp_reg);

Again, this is rather hacky, but I've found it to work quite well. The buffer will be populated with link addresses of each function call in the call stack.

Chane answered 5/4, 2017 at 22:39 Comment(1)
Hackish, somewhat works. I could get 2 stack frames using this method. _Unwind_Backtrace() and libunwind gave me all 4 frames.Raconteur
D
0

Does your executable contain debugging information, from compiling with the -g option? I think this is required to get a full stack trace without a frame pointer.

You might need -gdwarf-2 to make sure it uses a format that includes unwind information.

Dickens answered 4/8, 2010 at 11:54 Comment(1)
Possible, although I'm pretty sure (like 99.9%) that the DWARF info doesn't actually make it into the binary image programmed into flash. How would I check?Cadency

© 2022 - 2024 — McMap. All rights reserved.