Calling printf in extended inline ASM
Asked Answered
M

1

5

I'm trying to output the same string twice in extended inline ASM in GCC, on 64-bit Linux.

int main()
{
    const char* test = "test\n";

    asm(
        "movq %[test], %%rdi\n"    // Debugger shows rdi = *address of string*  
        "movq $0, %%rax\n"

        "push %%rbp\n"
        "push %%rbx\n"
        "call printf\n"         
        "pop %%rbx\n"
        "pop %%rbp\n"

        "movq %[test], %%rdi\n" // Debugger shows rdi = 0
        "movq $0, %%rax\n"

        "push %%rbp\n"
        "push %%rbx\n"
        "call printf\n"     
        "pop %%rbx\n"
        "pop %%rbp\n"
        : 
        :  [test] "g" (test)
        : "rax", "rbx","rcx", "rdx", "rdi", "rsi", "rsp"
        );

    return 0;
}

Now, the string is outputted only once. I have tried many things, but I guess I am missing some caveats about the calling convention. I'm not even sure if the clobber list is correct or if I need to save and restore RBP and RBX at all.

Why is the string not outputted twice?

Looking with a debugger shows me that somehow when the string is loaded into rdi for the second time it has the value 0 instead of the actual address of the string.

I cannot explain why, it seems like after the first call the stack is corrupted? Do I have to restore it in some way?

Muldrow answered 28/5, 2016 at 19:5 Comment(5)
@Michael Petch Thanks, I read about that and it looks like it could be the problem. I also noticed that if I declare the strings outside of main it works as intended. Do I even have to preserve rbp and rbx?Muldrow
If RBP is in the same pointer at the end of your inline assembler as it is when it started no reason to save it. RBX is a non-volatile register so it's value is preserved across the call to _printf per the 64-bit calling convention. Whether registers are considered volatile (not saved across a function call) or non-volatile (preserved across a function call) can be found in the System V ABI in figure 3.4.Tronna
@Michael Thank you, it definitely was that. Either moving the strings declaration outside of main or subtracting 128 and then adding it in the end fixed the problem. I will gladly accept an answer if you feel like writing one.Muldrow
@Michael I agree, I would never use inline ASM in such a way but unfortunately this is part of a university course and I have to stick to it.Muldrow
related: stackoverflow.com/questions/3467180/…Gesellschaft
T
13

Specific problem to your code: RDI is not maintained across a function call (see below). It is correct before the first call to printf but is clobbered by printf. You'll need to temporarily store it elsewhere first. A register that isn't clobbered will be convenient. You can then save a copy before printf, and copy it back to RDI after.


I do not recommend doing what you are suggesting (making function calls in inline assembler). It will be very difficult for the compiler to optimize things. It is very easy to get things wrong. David Wohlferd wrote a very good article on reasons not to use inline assembly unless absolutely necessary.

Among other things the 64-bit System V ABI mandates a 128-byte red zone. That means you can't push anything onto the stack without potential corruption. Remember: doing a CALL pushes a return address on the stack. Quick and dirty way to resolve this problem is to subtract 128 from RSP when your inline assembler starts and then add 128 back when finished.

The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers.8 Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

Another issue to be concerned about is the requirement for the stack to be 16-byte aligned (or possibly 32-byte aligned depending on the parameters) prior to any function call. This is required by the 64-bit ABI as well:

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point.

Note: This requirement for 16-byte alignment upon a CALL to a function is also required on 32-bit Linux for GCC >= 4.5:

In context of the C programming language, function arguments are pushed on the stack in the reverse order. In Linux, GCC sets the de facto standard for calling conventions. Since GCC version 4.5, the stack must be aligned to a 16-byte boundary when calling a function (previous versions only required a 4-byte alignment.)

Since we call printf in inline assembler we should ensure that we align the stack to a 16-byte boundary before making the call.

You also have to be aware that when calling a function some registers are preserved across a function call and some are not. Specifically those that may be clobbered by a function call are listed in Figure 3.4 of the 64-bit ABI (see previous link). Those registers are RAX, RCX, RDX, RD8-RD11, XMM0-XMM15, MMX0-MMX7, ST0-ST7 . These are all potentially destroyed so should be put in the clobber list if they don't appear in the input and output constraints.

The following code should satisfy most of the conditions to ensure that inline assembler that calls another function will not inadvertently clobber registers, preserves the redzone, and maintains 16-byte alignment before a call:

int main()
{
    const char* test = "test\n";
    long dummyreg; /* dummyreg used to allow GCC to pick available register */

    __asm__ __volatile__ (
        "add $-128, %%rsp\n\t"   /* Skip the current redzone */
        "mov %%rsp, %[temp]\n\t" /* Copy RSP to available register */
        "and $-16, %%rsp\n\t"    /* Align stack to 16-byte boundary */
        "mov %[test], %%rdi\n\t" /* RDI is address of string */
        "xor %%eax, %%eax\n\t"   /* Variadic function set AL. This case 0 */
        "call printf\n\t"
        "mov %[test], %%rdi\n\t" /* RDI is address of string again */
        "xor %%eax, %%eax\n\t"   /* Variadic function set AL. This case 0 */
        "call printf\n\t"
        "mov %[temp], %%rsp\n\t" /* Restore RSP */
        "sub $-128, %%rsp\n\t"   /* Add 128 to RSP to restore to orig */
        :  [temp]"=&r"(dummyreg) /* Allow GCC to pick available output register. Modified
                                    before all inputs consumed so use & for early clobber*/
        :  [test]"r"(test),      /* Choose available register as input operand */
           "m"(test)             /* Dummy constraint to make sure test array
                                    is fully realized in memory before inline
                                    assembly is executed */
        : "rax", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10", "r11",
          "xmm0","xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
          "xmm8","xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
          "mm0","mm1", "mm2", "mm3", "mm4", "mm5", "mm6", "mm6",
          "st", "st(1)", "st(2)", "st(3)", "st(4)", "st(5)", "st(6)", "st(7)"
        );

    return 0;
}

I used an input constraint to allow the template to choose an available register to be used to pass the str address through. This ensures that we have a register to store the str address between the calls to printf. I also get the assembler template to choose an available location for storing RSP temporarily by using a dummy register. The registers chosen will not include any one already chosen/listed as an input/output/clobber operand.

This looks very messy, but failure to do it correctly could lead to problems later as you program becomes more complex. This is why calling functions that conform to the System V 64-bit ABI within inline assembler is generally not the best way to do things.

Tronna answered 28/5, 2016 at 21:10 Comment(4)
The x86-64 SysV ABI itself documents the 16B stack alignment requirement. It may still only be a de-facto standard for 32bit (I didn't check), but it's in the spec for 64bit. At the start of an asm statement, the stack might be unaligned (e.g. in a function that didn't modify %rsp at all, the stack will be 8B-aligned but not 16B-aligned.) Any time gcc reserves more stack space with add, it keeps it 16B-aligned. But often it just pushes some call-preserved regs and then spills to the red-zone, esp in leaf functions. So yeah, I don't see a reliable way to avoid and $-16, %rspTalanta
@PeterCordes, that is a bad assumption to make. The standard only says it has to be 16-byte aligned at point of a function call, but the standard doesn't say how that is arrived at. It isn't required that RSP have a value subtracted from it that keeps alignment. Nothing prevents a compiler from generating a push statement that happens to also realign the stack. My method makes no assumption about the code that was generated in that regard.Tronna
The point of comment was the first line: The 64bit ABI does standardize it, so the quote about it being a de-facto standard with gcc4.5 is misplaced. I went off-topic from there. Also, I was trying to think through whether there was any reliable way to avoid the and $-16, or even semi-reliable ways, like putting your asm statement in a long function that probably reserved some extra space. But depending on what it is, maybe the optimizer places it ahead of the instruction that reserves stack space (if it can be hoisted out of a loop).Talanta
If compiling with AVX-512 enabled, you also need to declare clobbers on xmm16..31, and k0..7. Any future ISA extensions that introduce new user-space state will potentially also need more clobbers, if it's call-clobbered.Talanta

© 2022 - 2024 — McMap. All rights reserved.