Avoid optimizing away variable with inline asm
Asked Answered
T

3

4

I was reading Preventing compiler optimizations while benchmarking that describes how clobber() and escape() from Chandler Carruths talk CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!" affects the compiler.

From reading that, I assumed that if I have an input constraint like "g"(val), then the compiler wouldn't be able to optimize away val. But in g() below, no code is generated. Why?

How can doNotOptimize() be rewritten to ensure code is generated for g()?

template <typename T>
void doNotOptimize(T const& val) {
  asm volatile("" : : "g"(val) : "memory");
}

void f() {
  char x = 1;
  doNotOptimize(&x);    // x is NOT optimized away
}

void g() {
  char x = 1;
  doNotOptimize(x);     // x is optimized away
}

https://godbolt.org/g/Ndd56K

Teno answered 15/6, 2017 at 8:45 Comment(2)
I would say, because g passes the value as const ref to doNotOptimize the compiler is allowed to assume that x or better val must not change, so it can optimize the call away. f on the other hand passes a const pointer to doNotOptimize so x could indeed change.Glass
Related Q&As about the same functions: Preventing compiler optimizations while benchmarking / I don't understand the definition of DoNotOptimizeAway / "Escape" and "Clobber" equivalent in MSVCCymose
S
9

What, exactly, would it mean to have code generated for g()? If you were writing it yourself, what code would you write? Seriously, this is a real question. You have to decide what output you're expecting before you can start cajoling it from the compiler.

Anyway, let's look at what you have now. In f(),

void f() {
  char x = 1;
  doNotOptimize(&x);    // x is NOT optimized away
}

you are taking the address of x, which prevents the optimizer from allocating it in a register. It has to be allocated in memory in order for it to have an address.

However, in g(),

void g() {
  char x = 1;
  doNotOptimize(x);     // x is optimized away
}

x is just a local variable and any sane optimizer will allocate that in a register, or in this case as a constant. This is allowed, since you never take its address; you just use its value. So, for example, the compiler might generate code like this:

g():
    mov  al, 1      // store 1 in BYTE-sized register AL
    ...

Or as in this case not generate any code at all, and substitute any use of the variable by it's constant value.

Your doNotOptimize code,

template <typename T>
void doNotOptimize(T const& val) {
  asm volatile("" : : "g"(val) : "memory");
}

uses the g constraint for the val parameter, which says that it can be stored in either a general-purpose register, memory or as a constant, whichever the optimizer finds most convenient. Since val is a constant, when this call is inlined, the optimizer leaves it as a constant. Your "memory" clobber specifier has no effect, because there is no modification of memory going on here.

So what can we do? Well, we can force the variable x to be allocated in memory, even though it doesn't need to be, by using the m constraint:

template <typename T>
void doNotOptimize(T const& val) {
  asm volatile("" : : "m"(val) : "memory");
}

void g() {
  char x = 1;
  doNotOptimize(x);
}

Now the compiler can't optimize the store of x away and is forced to emit the following code:

g():
    mov  BYTE PTR [rsp-1], 1
    ret

Note that this is basically the same effect that declaring the x variable volatile would have.

Remember the question I asked at the beginning? Is that the output you wanted?

Or, maybe you want the compiler to emit that immediate-to-register move. If so, the r constraint will work—or any of the x86-specific constraints that allow you to dictate a particular register. This forces the optimizer to allocate the value in a register, even though it doesn't need to be:

g():
    mov     eax, 1
    ret

I cannot, however, see what the point of either of these would be.

If you wanted to craft a microbenchmark that tested the overhead of calling a function with a single const-reference parameter, then a better option would be to ensure that the definition of the function being called is not visible to the optimizer. Then, it can't inline that function and has to arrange for the call to be made, including all necessary setup. This also works well if you're just studying how a compiler might emit that code. (Naturally, you can't use a template function, though. Well, unless you wanted to abuse C++11's extern templates.)

Schutzstaffel answered 15/6, 2017 at 11:36 Comment(2)
I had the fuzzy notion that the doNotOptimize() function should work regardless of the input parameter being a pointer or not. As to the meaning of the asm constraints, I hadn't understood them. BTW, I noticed that Facebooks folly library do have template specializations for different types. I rewrote my example using their doNotOptimizeAway: godbolt.org/g/C8OvGmChaste
You sometimes want to use "+r"(var) alongside this, to tell trick the compiler into re-computing something inside a loop.Cymose
A
0

I would recommend to declare

volatile char x = 1;

But notice that the compiler is "right" to optimize like you observe.

Aviator answered 15/6, 2017 at 11:38 Comment(0)
B
0

No code is generated for g() because the "g" constraint allows the input to be optimised to a constant.

Bagehot answered 16/6, 2017 at 18:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.