gcc flags to disable arithmetic optimisations
Asked Answered
A

2

5

Does gcc/g++ have flags to enable or disable arithmetic optimisations, e.g. where a+a+...+a is replaced by n*a when a is an integer? In particular, can this be disabled when using -O2 or -O3?

In the example below even with -O0 the add operations are replaced by a single multiplication:

$ cat add1.cpp
unsigned int multiply_by_22(unsigned int a)
{
    return a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a;
}

$ g++ -S -masm=intel -O0 add1.cpp

$ cat add1.s
...
        imul    eax, edi, 22

Even disabling all the flags used in -O0 (see g++ -c -Q -O0 --help=optimizers | grep enabled) still produces the imul operation.

When adding loops, it requires -O1 to simplify the repeated addition to a single multiplication:

$ cat add2.cpp
unsigned int multiply(unsigned int a, unsigned int b)
{
    unsigned int sum=0;
    for(unsigned int i=0; i<b; i++)
        sum += a;
    return sum;
}

$ g++ -S -masm=intel -O1 add2.cpp

$ cat add2.s
...
        mov     eax, 0
.L3:
        add     eax, 1
        cmp     esi, eax
        jne     .L3
        imul    eax, edi
        ret

I.e. -O1 has moved the sum += a; outside the loop and replaced it by a single multiplication. With -O2 it will also remove the dead loop.

I'm just asking out of interest as I was trying to time some basic integer operations and noticed that the compiler optimised my loops away and I couldn't find any flags to disable this.

Amalgam answered 8/12, 2019 at 20:32 Comment(8)
I was searching for some pragmas that might force the behaviour you are talking about, but I only found #2220329 that sounded good... but all answers explain how to locally set -O0, and we know it is not enough for you.Sloop
Mostly there is no such flag. You can modify the code (add volatile, put the operations in separate statements). For signed type, the undefined sanitizer might also prevent some optimizations.Bots
Maybe asm volatile("" : "=r"(a) : : "memory");?Aweather
Alternatively you could try with asm keyword supported by gcc (look for documentation online) and write an explicit assembler section for your sum. I'm not sure about it, and that's the reason why I cannot write an answer about it, but I'm confident it could work. Since I'm not an Intel asm expert, I would start writing a simple a+b sum program, open the asm and extend it in order to have the a*22 sum. Then I would put it into the asm section.Sloop
Interestingly enough GCC 4.1.2 seems to produce exactly what you expect: godbolt.org/z/8QmFFv Though changing compiler version might not be relevant at all.Thralldom
Why would you want to time something that never happens in real code?Simmonds
@n.'pronouns'm. purely to time add, imul, idiv operations etc, and I know there are some good manuals online like agner.org/optimize/instruction_tables.pdf but it's always nice to replicate locally :)Amalgam
If you want to time certain assembly instructions, write those exact assembly instructions in assembly.Simmonds
K
5

I do not know such compiler flag.

Maybe you can try to use volatile as a substitute:

unsigned int multiply_by_22(volatile unsigned int a)
{
    return a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a+a;
}

with -O0 you get:

push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-4], edi
mov     edx, DWORD PTR [rbp-4]
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax
mov     eax, DWORD PTR [rbp-4]
add     edx, eax

etc...

For -O2 or -O3 generated code, you can visit: https://godbolt.org/z/Bk2b6Z

Karynkaryo answered 8/12, 2019 at 20:46 Comment(6)
volatile certainly works but has the disadvantage that even with -O2 it won't be stored in the cpu registers and always requires memory access.Amalgam
@Amalgam Yes, I do agree with that.Karynkaryo
Note that clang does not require volatile to disable the multiplication at -O0, but still stores edi to memory [ebp-4] to perform the additions. At -O1, it optimises the addition without an imul instruction, with 2 lea and an add. The code generated by gcc at -O1 and -O2 with the volatile is horrible.Darby
@Amalgam why are you worrying about the value not being stored in register when you're trying to pessimize the output binary?Skiascope
@Skiascope because I'm interested in the timing of e.g. add and slow memory access like mov eax, DWORD PTR [rbp-4] will distort this.Amalgam
@Amalgam in that case this is about micro-benchmarking and not about compiler optimzation. Assembly or inline assembly is the the solution for thatSkiascope
A
2

In the absence of any compiler flags I see only two options to enforce add:

  • Write a more complex series of additions the compiler can't optimise away, e.g. Fibbonaci series (although this will overrun quickly):
$ cat fibonacci.cpp
unsigned int fibonacci(unsigned int ops)
{
    unsigned int a=1;
    unsigned int b=1;
    for(unsigned int i=0; i<ops/2; i++) {
        a+=b;
        b+=a;
    }
    return b;
}

$ g++ -Wall -S -masm=intel -O3 --unroll-loops fibonacci.cpp

$ cat fibonacci.s
...
.L3:
        add     edx, eax
        add     ecx, 8
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        add     edx, eax
        add     eax, edx
        cmp     ecx, edi
        jne     .L3
  • Write an assembly routine which emits add operations:
unsigned int multiply_by_5(unsigned int a)
{
   unsigned int sum = 0;
   asm ( "# start multiply_by_5\n\t"
         "movl %1, %%ebx\n\t"           // ebx = a
         "movl $0, %%eax\n\t"           // eax = 0 (sum = 0)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "addl %%ebx, %%eax\n\t"        // eax += ebx (sum+=a)
         "movl %%eax, %0\n\t"           // sum = eax
         "# end multiply_by_5\n"
         : "=m" (sum) : "m" (a) : "%eax", "%ebx");
   return sum;
}
Amalgam answered 8/12, 2019 at 22:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.