Using base pointer register in C++ inline asm

Asked 29/12, 2015 at 22:20 Answered 30/12, 2015 at 4:6

Solved c++gcc x86 inline-assembly red-zone

I want to be able to use the base pointer register (%rbp) within inline asm. A toy example of this is like so:

void Foo(int &x)
{
    asm volatile ("pushq %%rbp;"         // 'prologue'
                  "movq %%rsp, %%rbp;"   // 'prologue'
                  "subq $12, %%rsp;"     // make room

                  "movl $5, -12(%%rbp);" // some asm instruction

                  "movq %%rbp, %%rsp;"  // 'epilogue'
                  "popq %%rbp;"         // 'epilogue'
                  : : : );
    x = 5;
}

int main() 
{
    int x;
    Foo(x);
    return 0;
}

I hoped that, since I am using the usual prologue/epilogue function-calling method of pushing and popping the old %rbp, this would be ok. However, it seg faults when I try to access x after the inline asm.

The GCC-generated assembly code (slightly stripped-down) is:

_Foo:
    pushq   %rbp
    movq    %rsp, %rbp
    movq    %rdi, -8(%rbp)

    # INLINEASM
    pushq %rbp;          // prologue
    movq %rsp, %rbp;     // prologue
    subq $12, %rsp;      // make room
    movl $5, -12(%rbp);  // some asm instruction
    movq %rbp, %rsp;     // epilogue
    popq %rbp;           // epilogue
    # /INLINEASM

    movq    -8(%rbp), %rax
    movl    $5, (%rax)      // x=5;
    popq    %rbp
    ret

main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    leaq    -4(%rbp), %rax
    movq    %rax, %rdi
    call    _Foo
    movl    $0, %eax
    leave
    ret

Can anyone tell me why this seg faults? It seems that I somehow corrupt %rbp but I don't see how. Thanks in advance.

I'm running GCC 4.8.4 on 64-bit Ubuntu 14.04.

Formula answered 29/12, 2015 at 22:20 Comment(4)

Do not add tags of unrelated languages. – Execrative 29/12, 2015 at 22:22

For the assembler code: use the assember arguments to specify C-side variables; do not rely on a specific register layout in the assembler code. And always specify clobbers. – Execrative 29/12, 2015 at 22:24

movq %rdi, -8(%rbp) placed RDI in the redzone. You then do pushq %rbp; which decrements RSP by 8 and places the value in RBP there. Unfortunately since RSP=RBP you just overwrote the value that GCC stored there (which is suppose to be RDI). After your inline assembler finished it tried movq -8(%rbp), %rax. Well we just learned you trashed the data at memory location -8(%rbp) so it now contain a bogus value, and then we try to de-reference it with movl $5, (%rax). This instruction likely segfaults because RAX doesn't have a valid pointer anymore. – Meditate 30/12, 2015 at 4:26

If you want to use C/C++ variables inside inline assembler you really need to start using input (and output if needed) constraints to allow data to be passed in (and/or out). – Meditate 30/12, 2015 at 4:32

See the bottom of this answer for a collection of links to other inline-asm Q&As.

Your code is broken because you step on the red-zone below RSP (with push) where GCC was keeping a value.

What are you hoping to learn to accomplish with inline asm? If you want to learn inline asm, learn to use it to make efficient code, rather than horrible stuff like this. If you want to write function prologues and push/pop to save/restore registers, you should write whole functions in asm. (Then you can easily use nasm or yasm, rather than the less-preferred-by-most AT&T syntax with GNU assembler directives¹.)

GNU inline asm is hard to use, but allows you to mix custom asm fragments into C and C++ while letting the compiler handle register allocation and any saving/restoring if necessary. Sometimes the compiler will be able to avoid the save and restore by giving you a register that's allowed to be clobbered. Without volatile, it can even hoist asm statements out of loops when the input would be the same. (i.e. unless you use volatile, the outputs are assumed to be a "pure" function of the inputs.)

If you're just trying to learn asm in the first place, GNU inline asm is a terrible choice. You have to fully understand almost everything that's going on with the asm, and understand what the compiler needs to know, to write correct input/output constraints and get everything right. Mistakes will lead to clobbering things and hard-to-debug breakage. The function-call ABI is a much simpler and easier to keep track of boundary between your code and the compiler's code.

Why this breaks

You compiled with -O0, so gcc's code spills the function parameter from %rdi to a location on the stack. (This could happen in a non-trivial function even with -O3).

Since the target ABI is the x86-64 SysV ABI, it uses the "Red Zone" (128 bytes below %rsp that even asynchronous signal handlers aren't allowed to clobber), instead of wasting an instruction decrementing the stack pointer to reserve space.

It stores the 8B pointer function arg at -8(rsp_at_function_entry). Then your inline asm pushes %rbp, which decrements %rsp by 8 and then writes there, clobbering the low 32b of &x (the pointer).

When your inline asm is done,

gcc reloads -8(%rbp) (which has been overwritten with %rbp) and uses it as the address for a 4B store.
Foo returns to main with %rbp = (upper32)|5 (orig value with the low 32 set to 5).
main runs leave: %rsp = (upper32)|5
main runs ret with %rsp = (upper32)|5, reading the return address from virtual address (void*)(upper32|5), which from your comment is 0x7fff0000000d.

I didn't check with a debugger; one of those steps might be slightly off, but the problem is definitely that you clobber the red zone, leading to gcc's code trashing the stack.

Even adding a "memory" clobber doesn't get gcc to avoid using the red zone, so it looks like allocating your own stack memory from inline asm is just a bad idea. (A memory clobber means you might have written some memory you're allowed to write to, e.g. a global variable or something pointed-to by a global, not that you might have overwritten something you're not supposed to.)

If you want to use scratch space from inline asm, you should probably declare an array as a local variable and use it as an output-only operand (which you never read from).

AFAIK, there's no syntax for declaring that you modify the red-zone, so your only options are:

use an "=m" output operand (possibly an array) for scratch space; the compiler will probably fill in that operand with an addressing mode relative to RBP or RSP. You can index into it with constants like 4 + %[tmp] or whatever. You might get an assembler warning from 4 + (%rsp) but not an error.
skip over the red-zone with add $-128, %rsp / sub $-128, %rsp around your code. (Necessary if you want to use an unknown amount of extra stack space, e.g. push in a loop, or making a function call. Yet another reason to deref a function pointer in pure C, not inline asm.)
compile with -mno-red-zone (I don't think you can enable that on a per-function basis, only per-file)
Don't use scratch space in the first place. Tell the compiler what registers you clobber and let it save them.

Here's what you should have done:

void Bar(int &x)
{
    int tmp;
    long tmplong;
    asm ("lea  -16 + %[mem1], %%rbp\n\t"
         "imul $10, %%rbp, %q[reg1]\n\t"  // q modifier: 64bit name.
         "add  %k[reg1], %k[reg1]\n\t"    // k modifier: 32bit name
         "movl $5, %[mem1]\n\t" // some asm instruction writing to mem
           : [mem1] "=m" (tmp), [reg1] "=r" (tmplong)  // tmp vars -> tmp regs / mem for use inside asm
           :
           : "%rbp" // tell compiler it needs to save/restore %rbp.
  // gcc refuses to let you clobber %rbp with -fno-omit-frame-pointer (the default at -O0)
  // clang lets you, but memory operands still use an offset from %rbp, which will crash!
  // gcc memory operands still reference %rsp, so don't modify it.  Declaring a clobber on %rsp does nothing
         );
    x = 5;
}

Note the push/pop of %rbp in the code outside the #APP / #NO_APP section, emitted by gcc. Also note that the scratch memory it gives you is in the red zone. If you compile with -O0, you'll see that it's at a different position from where it spills &x.

To get more scratch regs, it's better to just declare more output operands that are never used by the surrounding non-asm code. That leaves register allocation to the compiler, so it can be different when inlined into different places. Choosing ahead of time and declaring a clobber only makes sense if you need to use a specific register (e.g. shift count in %cl). Of course, an input constraint like "c" (count) gets gcc to put the count in rcx/ecx/cx/cl, so you don't emit a potentially redundant mov %[count], %%ecx.

If this looks too complicated, don't use inline asm. Either lead the compiler to the asm you want with C that's like the optimal asm, or write a whole function in asm.

When using inline asm, keep it as small as possible: ideally just the one or two instructions that gcc isn't emitting on its own, with input/output constraints to tell it how to get data into / out of the asm statement. This is what it's designed for.

Rule of thumb: if your GNU C inline asm start or ends with a mov, you're usually doing it wrong and should have used a constraint instead.

Footnotes:

You can use GAS's intel-syntax in inline-asm by building with -masm=intel (in which case your code will only work with that option), or using dialect alternatives so it works with the compiler in Intel or AT&T asm output syntax. But that doesn't change the directives, and GAS's Intel-syntax is not well documented. (It's like MASM, not NASM, though.) I don't really recommend it unless you really hate AT&T syntax.

Inline asm links:

x86 wiki. (The tag wiki also links to this question, for this collection of links)
The inline-assembly tag wiki
The manual. Read this. Note that inline asm was designed to wrap single instructions that the compiler doesn't normally emit. That's why it's worded to say things like "the instruction", not "the block of code".
A tutorial
Looping over arrays with inline assembly Using r constraints for pointers/indices and using your choice of addressing mode, vs. using m constraints to let gcc choose between incrementing pointers vs. indexing arrays.
How can I indicate that the memory *pointed* to by an inline ASM argument may be used? (pointer inputs in registers do not imply that the pointed-to memory is read and/or written, so it might not be in sync if you don't tell the compiler).
In GNU C inline asm, what're the modifiers for xmm/ymm/zmm for a single operand?. Using %q0 to get %rax vs. %w0 to get %ax. Using %g[scalar] to get %zmm0 instead of %xmm0.
Efficient 128-bit addition using carry flag Stephen Canon's answer explains a case where an early-clobber declaration is needed on a read+write operand. Also note that x86/x86-64 inline asm doesn't need to declare a "cc" clobber (the condition codes, aka flags); it's implicit. (gcc6 introduces syntax for using flag conditions as input/output operands. Before that you have to setcc a register that gcc will emit code to test, which is obviously worse.)
Questions about the performance of different implementations of strlen: my answer on a question with some badly-used inline asm, with an answer similar to this one.
llvm reports: unsupported inline asm: input with type 'void *' matching output with type 'int': Using offsetable memory operands (in x86, all effective addresses are offsettable: you can always add a displacement).
When not to use inline asm, with an example of 32b/32b => 32b division and remainder that the compiler can already do with a single div. (The code in the question is an example of how not to use inline asm: many instructions for setup and save/restore that should be left to the compiler by writing proper in/out constraints.)
MSVC inline asm vs. GNU C inline asm for wrapping a single instruction, with a correct example of inline asm for 64b/32b=>32bit division. MSVC's design and syntax require a round trip through memory for inputs and outputs, making it terrible for short functions. It's also "never very reliable" according to Ross Ridge's comment on that answer.
Using x87 floating point, and commutative operands. Not a great example, because I didn't find a way to get gcc to emit ideal code.

Some of those re-iterate some of the same stuff I explained here. I didn't re-read them to try to avoid redundancy, sorry.

Parrott answered 30/12, 2015 at 4:6 Comment(10)

Thanks so much for this detailed and informative answer. What do you mean by "write the whole function in asm"—how would I then integrate it with C/C++ code? Or do you mean write the whole program in asm? – Formula 30/12, 2015 at 14:12

Awesome answer Peter! But in your links you got to the same link twice. – Zion 30/12, 2015 at 18:43

@jaw: write the prototype in C, and write the function in a separate .S (GNU syntax) or .asm (NASM/YASM syntax). gcc -Wall -O3 main.c myfunc.S -o myprog. See stackoverflow.com/questions/13901261/…. If you use NASM/YASM, run yasm -felf64 myfunc.asm to make a .o that you can link with C. Make sure your function follows the ABI (which regs to preserve, and where to find its args), or else it will break when gcc's code calls it. IIRC, Agner Fog's Optimizing Assembly guide spends some time on how to do this. (links in the x86 tag wiki) – Parrott 30/12, 2015 at 19:14

@Zboson: I think I remembered the other links I meant to have, instead of the duplicate: the early-clobber discussion. And also the operand-size modifiers question. – Parrott 30/12, 2015 at 19:51

Great links. In regards to Stephen Canon's answer it originally did not have a clobber modifier. I followed his answer but for 256-bit add and I could not get the correct answer until I figured out to use a clobber modifier. I never had a problem with his 128-bit add without the clobber modifier. The problem happened after the third add. But that's probably just coincidental. That's why I left a comment in his answer. – Zion 30/12, 2015 at 20:1

"rather than the less-preferred-by-most AT&T syntax with GNU assembler directives.)" it may be worth nothing now that you have -masm="intel" for inline assembly with intel. – Colchicine 5/3, 2018 at 0:28

@EvanCarroll: Yeah, but then your code requires that build option. That's ok for some cases, though. You can write code that works either way, using syntax alternatives like "add {%0, %1 | %1, %0}". See gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html ("Multiple assembler dialects" section) for how to write code you could put in a library .h that doesn't care whether its built with -masm=intel or not. Anyway, IMO AT&T syntax is not that bad, and I'd just use it if writing inline asm. It's the standard way to do things. – Parrott 5/3, 2018 at 1:9

@PeterCordes The tone of the answer would leave one to believe that you can not use the intel syntax. "(Then you can easily use nasm or yasm, rather than the less-preferred-by-most AT&T syntax with GNU assembler directives.)" I believe that may have been true at one time. It seems like there are multiple ways to use the Intel syntax now. – Colchicine 5/3, 2018 at 1:13

@EvanCarroll: thanks, I had a look over the answer and agree it should at least mention the option. I didn't want to clutter the opening paragraphs before I get to the point I wanted to make, so I just put it in a footnote. – Parrott 5/3, 2018 at 1:27

Related: How do I tell gcc that my inline assembly clobbers part of the stack? – Parrott 16/3, 2022 at 15:5

In x86-64, the stack pointer needs to be aligned to 8 bytes.

This:

subq $12, %rsp;      // make room

should be:

subq $16, %rsp;      // make room

Kalk answered 29/12, 2015 at 22:30 Comment(8)

Sorry, my mistake: it doesn't actually seem to have solved the problem. It's still seg faulting. Any other ideas? – Formula 29/12, 2015 at 23:9

In fact, even without the "// some asm instruction" statement, it still fails. Why does pushing and popping mess up $rbp? – Formula 29/12, 2015 at 23:15

Exactly where is it segfaulting? What address is it accessing and what instruction is it at? – Kalk 29/12, 2015 at 23:15

The output from gdb looks like this (it seg faults a couple of lines after the asm): Line 11: Foo (x=@0x7fffffffe034: 32767): " : : : );" , Line 12: Foo (x=@0x7fffffffe020: -8128): "x = 5", Line 13: "}": Cannot access memory at address 0x7fff0000000d. Program received signal SIGSEGV, Segmentation fault." – Formula 29/12, 2015 at 23:27

(Sorry: can't put linebreaks in the above comment) – Formula 29/12, 2015 at 23:31

You probably want to disassemble that to see what instruction that actually is - but looks to me like you have messed up the stack pointer completely. – Kalk 29/12, 2015 at 23:34

Do edit the question with disass foo and info reg after the crash. – Kalk 29/12, 2015 at 23:35

That can't be right, because alignment is just for performance reasons when you aren't using SSE alignment-required loads/stores. You're right that he does end up breaking the stack, though. It's easier to see what's wrong now that the OP posted a comment with some minimal crash output from gdb. – Parrott 30/12, 2015 at 4:14

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Why this breaks

Here's what you should have done:

Inline asm links:

Recommended topics

Hot tags