Can instruction order happen cross function call?
Asked Answered
M

1

6

Suppose I have pseudo C code like below:

int x = 0;
int y = 0;

int __attribute__ ((noinline)) func1(void)
{ 
  int prev = x;  (1)

   x |= FLAG;    (2)

   return prev;  (3)
}

int main(void)
{  
  int tmp;

   ...
   y = 5;   (4)
   compiler_mem_barrier();
   func1();
   compiler_mem_barrier();
   tmp = y;  (5)
   ...
}

Suppose this is a single threaded process so we don't need to worry about locks. And suppose the code is running on an x86 system. Let's also suppose the compiler doesn't do any reordering.

I understand that x86 systems can only reorder write/read instructions (Reads may be reordered with older writes to different locations but not with older writes to the same location). But it's not clear to me if call/ret instructions are considered to be WRITE/READ instructions. So here are my questions:

  1. On x86 systems, is "call" treated as a WRITE instruction? I assume so since call will push the address to the stack. But I didn't find an official document officially saying that. So please help confirm.

  2. For the same reason, is "ret" treated as a READ instruction (since it pops the address from the stack)?

  3. Actually, can "ret" instruction be reordered within the function. For example, can (3) be executed before (2) in the ASM code below? This doesn't make sense to me, but "ret" is not a serializing instruction. I didn't find any place in Intel Manual saying "ret" cannot be reordered.

  4. In the code above, can (1) be executed before (4)? Presumably, read instructions (1) can be reordered ahead of write instructions (4). The "call" instruction may have a "jmp" part, but with speculative execution .... So I feel it can happen, but I hope someone more familiar with this issue can confirm this.

  5. In the code above, can (5) be executed before (2)? If "ret" is considered to be a READ instruction, then I assume it cannot happen. But again, I hope someone can confirm this.

In case the assembly code for func1() is needed, it should be something like:

mov    %gs:0x24,%eax          (1) 
orl    $0x8,%gs:0x24          (2) 
retq                          (3)
Macrocosm answered 8/7, 2016 at 7:25 Comment(7)
Ask yourself: Can the function be inlined by compiler?Manifestation
Let's assume that the function is not inlined. If the function is inlined, I know what can happen.Macrocosm
Functions are compiler reordering barriers, of course. But for the processor, afaik, neither call nor ret are not forcing serialization. See Intel 64 and IA32 Architecture 8.3 Serialization Instructions.Manifestation
So long as the reordering does not change the observable behavior of the application (i.e., the as-if rule), the compiler is free to reorder and rearrange as it sees fit to optimize the code. Note that reordering done by the compiler is very different from the out-of-order execution done by the processor itself. What is your actual question?Mediocrity
See also: #37787047, #26190864, and others, all findable with the search function.Mediocrity
Thanks for providing the link to the compiler reordering issue. My question is more specific to processor out-of-order execution. Let's assume the compiler didn't do any reordering here.Macrocosm
I understand that "call" and "ret" do not force serialization. But is "call" a WRITE instruction (it pushes the address to the stack) and is "ret" a READ instruction (it pops the address from the stack)?Macrocosm
K
7

Out-of-order execution can reorder anything, but it preserves the illusion that your code executed in program order. The cardinal rule of OoOE is that you don't break single-threaded programs. The hardware tracks dependencies so the instructions can execute as soon as their inputs and an execution unit are ready, but preserves the illusion that everything happened in program order.


You appear to be confusing OoOE on a single core with the order in which the loads/stores become globally visible to other cores. (The store buffer decouples those)

If you have one thread observing the stack memory of another thread running on another core, then yes, the store generated by call (pushing a return address) will be ordered with other stores.

However, out-of-order execution in the thread running this code can actually execute call and ret instructions while a store is delayed on a cache miss, or while a long dependency chain is executing. Multiple cache misses can be in flight at once. The memory-order buffer just has to make sure that later stores don't actually become globally visible until after earlier stores, to preserve x86's memory ordering semantics.


If you have a specific question about hardware reordering, you should probably post asm code, not C code, because C++ compilers can reorder at compile time based on the C++ memory model, which doesn't change when compiling for a strongly-ordered target like x86.

See also How does memory reordering help processors and compilers? (a Java question, but my answer isn't Java-specific).


re: your edit

This answer was already assuming your function was noinline, and that you were talking about ASM that looked like your C, not what a compiler would actually generate from your code.

mov    %gs:0x24,%eax          (1)                                                                                                                                                                                                
orl    $0x8,%gs:0x24          (2)                                                                                                                                                                                                
retq                          (3)

So x is actually in thread-local storage, not a plain global int x. This doesn't actually matter for out-of-order execution, though; a load with a %gs segment override is still a load.

Kathlyn answered 8/7, 2016 at 16:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.