Is this assembly function call safe/complete?
Asked Answered
C

2

3

I don't have experience in assembly, but this is what I've been working on. I would like input if I'm missing any fundamental aspects to passing parameters and calling a function via pointer in assembly.

For instance I'm wondering if I supposed to restore ecx, edx, esi, edi. I read they are general purpose registers, but I couldn't find if they need to be restored? Is there any kind of cleanup I am supposed to do after a call?

This is the code I have now, and it does work:

#include "stdio.h"

void foo(int a, int b, int c, int d)
{
  printf("values = %d and %d and %d and %d\r\n", a, b, c, d);
}

int main()
{

  int a=3,b=6,c=9,d=12;
  __asm__(
          "mov %3, %%ecx;"
          "mov %2, %%edx;"
          "mov %1, %%esi;"
          "mov %0, %%edi;"
          "call %4;"
          :
          : "g"(a), "g"(b), "g"(c), "g"(d), "a"(foo)
          );

}
Couthie answered 5/6, 2016 at 8:42 Comment(9)
You could try: codereview.stackexchange.comMatthaus
In short: this code is unsafe. You are changing registers in the inline asm without notifying the compiler. There are things you can do to make it better (use x86 machine constraints, add the memory clobber, clobber registers that the ABI allows foo to change (eax?), etc), but perhaps you would get a better answer if you said what you were trying to accomplish.Spanos
I'm assuming this is 64-bit code? (At least it appears that way from the calling convention being used). Calling a function in 64-bit code is a lot more involved than you may realize. I recently wrote an answer that may be some value to you (regarding 64-bit code, inline assembler, and calling functions). In that case I call printf but it pretty much applies to calling any function in 64-bit code from inline assembler: https://mcmap.net/q/16848/-calling-printf-in-extended-inline-asmLather
Possible duplicate of Calling printf in extended inline ASMHuffman
Not strictly a duplicate, but I think it will be helpful for future readers to have the questions linked.Huffman
Calling a function from inline asm is problematic beyond the reasons (red zone, etc) explained in the existing answers. If link-time optimization figures out that all of foo's callers are known, it could change how foo works so it doesn't even follow the standard ABI. Calling that function in a way the compiler can't see could lead to broken code after link-time optimization, if it builds at all. (e.g. maybe foo was inlined into all of its C callers, and no stand-alone definition was emitted. This happens even without LTO for static functions.)Huffman
If you pass a function pointer to inline assembler template (input operand), I believe that GCC is forced to keep a non-inlined version with whatever convention was specified in the function attributes (or default convention).Lather
@MichaelPetch: oh right, that makes sense. I was thinking of the case where you do a normal "call foo" with no indirection, like in that linked question about calling printf.Huffman
related: stackoverflow.com/questions/3467180/…Laureen
S
6

The original question was Is this assembly function call safe/complete?. The answer to that is: no. While it may appear to work in this simple example (especially if optimizations are disabled), you are violating rules that will eventually lead to failures (ones that are really hard to track down).

I'd like to address the (obvious) followup question of how to make it safe, but without feedback from the OP on the actual intent, I can't really do that.

So, I'll do the best I can with what we have and try to describe the things that make it unsafe and some of the things you can do about it.

Let's start by simplifying that asm:

 __asm__(
          "mov %0, %%edi;"
          :
          : "g"(a)
          );

Even with this single statement, this code is already unsafe. Why? Because we are changing the value of a register (edi) without letting the compiler know.

How can the compiler not know you ask? After all, it's right there in the asm! The answer comes from this line in the gcc docs:

GCC does not parse the assembler instructions themselves and does not know what they mean or even whether they are valid assembler input.

In that case, how do you let gcc know what's going on? The answer lies in using the constraints (the stuff after the colons) to describe the impact of the asm.

Perhaps the simplest way to fix this code would be like this:

  __asm__(
          "mov %0, %%edi;"
          :
          : "g"(a)
          : edi
          );

This adds edi to the clobber list. In brief, this tell gcc that the value of edi is going to be changed by the code, and that gcc shouldn't assume any particular value will be in it when the asm exits.

Now, while that's the easiest, it's not necessarily the best way. Consider this code:

  __asm__(
          ""
          :
          : "D"(a)
          );

This uses a machine constraint to tell gcc to put the value of the variable a into the edi register for you. Doing it this way, gcc will load the register for you at a 'convenient' time, perhaps by always keeping a in edi.

There is one (significant) caveat to this code: By putting the parameter after the 2nd colon, we are declaring it to be an input. Input parameters are required to be read-only (ie they must have the same value on exiting the asm).

In your case, the call statement means that we won't be able to guarantee that edi won't be changed, so this doesn't quite work. There are a few ways to deal with this. The easiest is to move the constraint up after the first colon, making it an output, and specify "+D" to indicate that the value is read+write. But then the contents of a are going to be pretty much undefined after the asm (printf could set it to anything). If destroying a is unacceptable, there's always something like this:

int junk;
  __asm__ volatile (
          ""
          : "=D" (junk)
          : "0"(a)
          );

This tells gcc that on starting the asm, it should put the value of the variable a into the same place as output constraint #0 (ie edi). It also says that on output, edi won't be a anymore, it will contain the variable junk.

Edit: Since the 'junk' variable isn't actually going to be used, we need to add the volatile qualifier. Volatile was implicit when there weren't any output parameters.

One other point on that line: You end it with a semi-colon. This is legal and will work as expected. However, if you ever want to use the -S command line option to see exactly what code got generated (and if you want to get good with inline asm, you will), you will find that produces difficult-to-read code. I'd recommend using \n\t instead of a semi-colon.

All that and we're still on the first line...

Obviously the same would apply to the other two mov statements.

Which brings us to the call statement.

Both Michael and I have listed a number of reasons doing call in inline asm is difficult.

  • Handling all the registers that may be clobbered by the function call's ABI.
  • Handling red-zone.
  • Handling alignment.
  • Memory clobber.

If the goal here is 'learning,' then feel free to experiment. But I don't know that I would ever feel comfortable doing this in production code. Even when it looks like it works, I'd never feel confident there wasn't some weird case I'd missed. That's aside from my normal concerns about using inline asm at all.

I know, that's a lot of information. Probably more than you were looking for as an introduction to gcc's asm command, but you've picked a challenging place to start.

If you haven't done so already, spend time looking over all the docs in gcc's Assembly Language interface. There's a lot of good information there along with examples to try to explain how it all works.

Spanos answered 5/6, 2016 at 23:6 Comment(4)
re: not using inline asm if you can avoid it: I expanded on that more in a recent answer. Totally agree it's a terrible idea for practical use, and will defeat a lot of optimizations (like inlining and constant-propagation), as well as being really hard to trust that it's safe. Also, thanks for that link to the gcc wiki.Huffman
@PeterCordes - After spending all that time learning about how gcc's inline asm worked so I could finally give those horrible docs the re-write they so desperately needed, I began to realize that while it's way cool, it's mostly a bad idea. That's why I eventually wrote that wiki page. For an encore, I'm currently trying to get gcc's basic asm (the kind without colons) deprecated when used within a function. That explains my other wiki entry. As bad as using extended asm is, basic is way worse.Spanos
BTW, "+D" (tmp) is a lot easier than using separate input and output args. It does clobber the C variable, so the separate input and output arg style might be easier than using tmp vars in C if you need the inputs after the inline asm. (The only case I remember where this doesn't work is x87: I think "+t" isn't allowed.)Huffman
Update: Calling printf in extended inline ASM shows the full set of clobbers, red-zone skipping, and stack alignment that are necessary to reliably(?) do this. Highly recommend against, as you say.Huffman
I
0

I read they are general purpose registers, but I couldn't find if they need to be restored?

I am not the expert in the field, but from my reading of the x86-64 ABI (Figure 3.4) the following registers: %rdi, %rsi, %rdx, and %rcx are not preserved between function calls, thus apparently don't require to be restored.

As commented by David Wohlferd you should be careful, because either way, the compiler will not be aware of the "custom" function call and in consequence you may get into its way, particularly because it may be not aware of registers modification.

Ivanivana answered 5/6, 2016 at 9:30 Comment(3)
While the ABI says they do not need to be restored, gcc has no idea that a function call is happening. It does not parse the asm at all. So it has no reason to think that the values of rdi, rsi, rdx or rcx have changed.Spanos
Better. But there are other considerations as well, such as the red-zone for 64bit code. This means that push/pop (the traditional method of 'restoring' registers) is more complicated than usual. And even though rax isn't explicitly mentioned in this code, it is subject to change by printf or any of its children, so it must be 'clobbered' as well. In addition to r8, r10, etc. (Safely) calling functions from inline asm is hard, and is usually a bad idea.Spanos
I concur with @DavidWohlferd :, calling functions from inline assembler requires a fair amount of knowledge. I wrote an answer that wasn't very trivial recently that involved 64-bit code/inline assembler/calling a function. On top of what David said GCC itself requires the stack to be aligned to a 16-byte boundary at the point a CALL is made. So not only do you need to deal with the redzone and clobbers, you need to deal with stack alignment before the call.Lather

© 2022 - 2024 — McMap. All rights reserved.