gcc flags to disable arithmetic optimisations

Asked 8/12, 2019 at 20:32 Answered 8/12, 2019 at 22:16

c gcc x86 compiler-optimization microbenchmark

Does gcc/g++ have flags to enable or disable arithmetic optimisations, e.g. where a+a+...+a is replaced by n*a when a is an integer? In particular, can this be disabled when using -O2 or -O3?

In the example below even with -O0 the add operations are replaced by a single multiplication:

$ cat add1.cpp
unsigned int multiply_by_22(unsigned int a)
{
    return a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a;
}

$ g++ -S -masm=intel -O0 add1.cpp

$ cat add1.s
...
        imul    eax, edi, 22

Even disabling all the flags used in -O0 (see g++ -c -Q -O0 --help=optimizers | grep enabled) still produces the imul operation.

When adding loops, it requires -O1 to simplify the repeated addition to a single multiplication:

$ cat add2.cpp
unsigned int multiply(unsigned int a, unsigned int b)
{
    unsigned int sum=0;
    for(unsigned int i=0; i<b; i++)
        sum += a;
    return sum;
}

$ g++ -S -masm=intel -O1 add2.cpp

$ cat add2.s
...
        mov     eax, 0
.L3:
        add     eax, 1
        cmp     esi, eax
        jne     .L3
        imul    eax, edi
        ret

I.e. -O1 has moved the sum += a; outside the loop and replaced it by a single multiplication. With -O2 it will also remove the dead loop.

I'm just asking out of interest as I was trying to time some basic integer operations and noticed that the compiler optimised my loops away and I couldn't find any flags to disable this.

Amalgam answered 8/12, 2019 at 20:32 Comment(8)

I was searching for some pragmas that might force the behaviour you are talking about, but I only found #2220329 that sounded good... but all answers explain how to locally set -O0, and we know it is not enough for you. – Sloop 8/12, 2019 at 20:41

Mostly there is no such flag. You can modify the code (add volatile, put the operations in separate statements). For signed type, the undefined sanitizer might also prevent some optimizations. – Bots 8/12, 2019 at 20:41

Maybe asm volatile("" : "=r"(a) : : "memory");? – Aweather 8/12, 2019 at 20:50

Alternatively you could try with asm keyword supported by gcc (look for documentation online) and write an explicit assembler section for your sum. I'm not sure about it, and that's the reason why I cannot write an answer about it, but I'm confident it could work. Since I'm not an Intel asm expert, I would start writing a simple a+b sum program, open the asm and extend it in order to have the a*22 sum. Then I would put it into the asm section. – Sloop 8/12, 2019 at 20:53

Interestingly enough GCC 4.1.2 seems to produce exactly what you expect: godbolt.org/z/8QmFFv Though changing compiler version might not be relevant at all. – Thralldom 8/12, 2019 at 20:53

Why would you want to time something that never happens in real code? – Simmonds 9/12, 2019 at 10:6

@n.'pronouns'm. purely to time add, imul, idiv operations etc, and I know there are some good manuals online like agner.org/optimize/instruction_tables.pdf but it's always nice to replicate locally :) – Amalgam 9/12, 2019 at 11:44

If you want to time certain assembly instructions, write those exact assembly instructions in assembly. – Simmonds 9/12, 2019 at 12:19

I do not know such compiler flag.

Maybe you can try to use volatile as a substitute:

unsigned int multiply_by_22(volatile unsigned int a)
{
    return a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a;
}

with -O0 you get:

push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-4], edi
mov     edx, DWORD PTR [rbp-4]
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax

etc...

For -O2 or -O3 generated code, you can visit: https://godbolt.org/z/Bk2b6Z

Karynkaryo answered 8/12, 2019 at 20:46 Comment(6)

volatile certainly works but has the disadvantage that even with -O2 it won't be stored in the cpu registers and always requires memory access. – Amalgam 8/12, 2019 at 22:28

@Amalgam Yes, I do agree with that. – Karynkaryo 8/12, 2019 at 22:30

Note that clang does not require volatile to disable the multiplication at -O0, but still stores edi to memory [ebp-4] to perform the additions. At -O1, it optimises the addition without an imul instruction, with 2 lea and an add. The code generated by gcc at -O1 and -O2 with the volatile is horrible. – Darby 9/12, 2019 at 0:47

@Amalgam why are you worrying about the value not being stored in register when you're trying to pessimize the output binary? – Skiascope 9/12, 2019 at 10:38

@Skiascope because I'm interested in the timing of e.g. add and slow memory access like mov eax, DWORD PTR [rbp-4] will distort this. – Amalgam 9/12, 2019 at 11:46

@Amalgam in that case this is about micro-benchmarking and not about compiler optimzation. Assembly or inline assembly is the the solution for that – Skiascope 9/12, 2019 at 13:18

In the absence of any compiler flags I see only two options to enforce add:

Write a more complex series of additions the compiler can't optimise away, e.g. Fibbonaci series (although this will overrun quickly):

$ cat fibonacci.cpp
unsigned int fibonacci(unsigned int ops)
{
    unsigned int a=1;
    unsigned int b=1;
    for(unsigned int i=0; i<ops/2; i++) {
        a+=b;
        b+=a;
    }
    return b;
}

$ g++ -Wall -S -masm=intel -O3 --unroll-loops fibonacci.cpp

$ cat fibonacci.s
...
.L3:
        add     edx, eax
        add     ecx, 8
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        cmp     ecx, edi
        jne     .L3

Write an assembly routine which emits add operations:

unsigned int multiply_by_5(unsigned int a)
{
   unsigned int sum = 0;
   asm ( "# start multiply_by_5\n\t"
         "movl %1, %%ebx\n\t"           // ebx = a
         "movl $0, %%eax\n\t"           // eax = 0 (sum = 0)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "movl %%eax, %0\n\t"           // sum = eax
         "# end multiply_by_5\n"
         : "=m" (sum) : "m" (a) : "%eax", "%ebx");
   return sum;
}

Amalgam answered 8/12, 2019 at 22:16 Comment(0)

Recommended topics

Hot tags