What is the meaning of question marks '?' in Linux kernel panic call traces?
Asked Answered
A

1

37

The Call Trace contains entries like that:

 [<deadbeef>] FunctionName+0xAB/0xCD [module_name]
 [<f00fface>] ? AnotherFunctionName+0x12/0x40 [module_name]
 [<deaffeed>] ClearFunctionName+0x88/0x88 [module_name]

What is the meaning of the '?' mark before AnotherFunctionName?

Aleurone answered 28/10, 2012 at 21:53 Comment(0)
S
44

'?' means that the information about this stack entry is probably not reliable.

The stack output mechanism (see the implementation of dump_trace() function) was unable to prove that the address it has found is a valid return address in the call stack.

'?' itself is output by printk_stack_address().

The stack entry may be valid or not. Sometimes one may simply skip it. It may be helpful to investigate the disassembly of the involved module to see which function is called at ClearFunctionName+0x88 (or, on x86, immediately before that position).

Concerning reliability

On x86, when dump_stack() is called, the function that actually examines the stack is print_context_stack() defined in arch/x86/kernel/dumpstack.c. Take a look at its code, I'll try to explain it below.

I assume DWARF2 stack unwind facilities are not available in your Linux system (most likely, they are not, if it is not OpenSUSE or SLES). In this case, print_context_stack() seems to do the following.

It starts from an address ('stack' variable in the code) that is guaranteed to be an address of a stack location. It is actually the address of a local variable in dump_stack().

The function repeatedly increments that address (while (valid_stack_ptr ...) { ... stack++}) and checks if what it points to could also be an address in the kernel code (if (__kernel_text_address(addr)) ...). This way it attempts to find the functions' return addresses pushed on stack when these functions were called.

Of course, not every unsigned long value that looks like a return address is actually a return address. So the function tries to check it. If frame pointers are used in the code of the kernel (%ebp/%rbp registers are employed for that if CONFIG_FRAME_POINTER is set), they can be used to traverse the stack frames of the functions. The return address for a function lies just above the frame pointer (i.e. at %ebp/%rbp + sizeof(unsigned long)). print_context_stack checks exactly that.

If there is a stack frame for which the value 'stack' points to is the return address, the value is considered a reliable stack entry. ops->address will be called for it with reliable == 1, it will eventually call printk_stack_address() and the value will be output as a reliable call stack entry. Otherwise the address will be considered unreliable. It will be output anyway but with '?' prepended.

[NB] If frame pointer information is not available (e.g. like it was in Debian 6 by default), all call stack entries will be marked as unreliable for this reason.

The systems with DWARF2 unwinding support (and with CONFIG_STACK_UNWIND set) is a whole another story.

Susurrant answered 29/10, 2012 at 7:18 Comment(7)
Great answer - it misses one thing to make it complete (and I'm a bit baffled by the level of indirection in the arch code) - what makes the entry unreliable?Aleurone
I have edited my answer. Hopefully, my explanation is not too confusing.Susurrant
Getting there :) Your answer actually confirms some of my suspicions about how it works - to give a bit of a background information, I'm trying to update a binary-blob + wrapper like driver - so the kernel is actually my own build.. The reason I got confused and wanted some explanation is that apparently some functions within the blob store function pointers in local variables, throwing the whole system off a bit. Please finish your 'whole another story' - especially how it works when main kernel is DWARF2, but some part of a module is not.Aleurone
Unfortunately, I am by no means an expert in DWARF2. The only implementation of DWARF2 unwinding I have seen so far is the one from SuSE Linux. As far as I know, the mainline kernel does not have support for it yet. Or - does it? Do you use the patch from SuSE or from some other Linux distro to add support for DWARF2 to your custom kernel? There may be different implementations out there.Susurrant
In addition, the kernels I have seen either had DWARF2 unwind enabled for all kernel-mode components or for none at all. Currently I cannot really say what happens if the kernel and the modules use different approaches to stack unwind.Susurrant
I agree, if the address of AnotherFunctionName+0x12 is in a local variable, it may show up in the stack trace. Same if it somehow got to a register that was then saved on stack. In both cases however, this would prevent neither framepointer-based nor DWARF2-based unwinder from doing their job. As this suspicious entry is surrounded by valid ones, perhaps, it could simply be ignored?Susurrant
It can certainly be ignored - I'm just trying to wrap my head around what is going on there - the binary only code is full of lookup tables and what resembles a virtual function-like object dispatchers.Aleurone

© 2022 - 2024 — McMap. All rights reserved.