Does gcc automatically "unroll" if-statements?
Asked Answered
L

1

8

Say I have a loop that looks like this:

for(int i = 0; i < 10000; i++) {
    /* Do something computationally expensive */
    if (i < 200 && !(i%20)) {
        /* Do something else */
    }
}

wherein some trivial task gets stuck behind an if-statement that only runs a handful of times. I've always heard that "if-statements in loops are slow!" So, in the hopes of (marginally) increased performance, I split the loops apart into:

for(int i = 0; i < 200; i++) {
    /* Do something computationally expensive */
    if (!(i%20)) {
        /* Do something else */
    }
}

for(int i = 200; i < 10000; i++) {
    /* Do something computationally expensive */
}

Will gcc (with the appropriate flags, like -O3) automatically break the one loop into two, or does it only unroll to decrease the number of iterations?

Limpkin answered 16/2, 2011 at 14:42 Comment(3)
I certainly remember discussion of this from the GCC mailing list - there is at least some support for it. Whether it would actually do it in your case or not is probably easiest to verify by checking the assembler output or -da; it may only have been unrolling the 0 case or the max loop case, I can't remember. However in your case (for 200-10000 it'll be: test i<200, ++i, test i<10000) I doubt it would make any significant performance difference - in the order of a cycle or two per loop, which will be dwarfed by the computationally expensive code.Intitule
You could probably optimize (and obfuscate) the code quite a lot by checking the lsb of of i at each iterator, since odd numbers won't be divisible by 20. If i<200 and (i&1)==0 and (i%20)==0. The % operator is notoriously ineffective.Rudolf
You always heard "If statements in loops are slow"? Where did you hear that? Just curious.Outbid
T
11

Why not just disassemble the program and see for yourself? But here we go. This is the testprogram:

int main() {
    int sum = 0;
    int i;
    for(i = 0; i < 10000; i++) {
        if (i < 200 && !(i%20)) {
            sum += 0xC0DE;
        }
        sum += 0xCAFE;
    }
    printf("%d\n", sum);
    return 0;
}

and this is the interesting part of the disassembled code compiled with gcc 4.3.3 and -o3:

0x08048404 <main+20>:   xor    ebx,ebx
0x08048406 <main+22>:   push   ecx
0x08048407 <main+23>:   xor    ecx,ecx
0x08048409 <main+25>:   sub    esp,0xc
0x0804840c <main+28>:   lea    esi,[esi+eiz*1+0x0]
0x08048410 <main+32>:   cmp    ecx,0xc7
0x08048416 <main+38>:   jg     0x8048436 <main+70>
0x08048418 <main+40>:   mov    eax,ecx
0x0804841a <main+42>:   imul   esi
0x0804841c <main+44>:   mov    eax,ecx
0x0804841e <main+46>:   sar    eax,0x1f
0x08048421 <main+49>:   sar    edx,0x3
0x08048424 <main+52>:   sub    edx,eax
0x08048426 <main+54>:   lea    edx,[edx+edx*4]
0x08048429 <main+57>:   shl    edx,0x2
0x0804842c <main+60>:   cmp    ecx,edx
0x0804842e <main+62>:   jne    0x8048436 <main+70>
0x08048430 <main+64>:   add    ebx,0xc0de
0x08048436 <main+70>:   add    ecx,0x1
0x08048439 <main+73>:   add    ebx,0xcafe
0x0804843f <main+79>:   cmp    ecx,0x2710
0x08048445 <main+85>:   jne    0x8048410 <main+32>
0x08048447 <main+87>:   mov    DWORD PTR [esp+0x8],ebx
0x0804844b <main+91>:   mov    DWORD PTR [esp+0x4],0x8048530
0x08048453 <main+99>:   mov    DWORD PTR [esp],0x1
0x0804845a <main+106>:  call   0x8048308 <__printf_chk@plt>

So as we see, for this particular example, no it does not. We have only one loop starting at main+32 and ending at main+85. If you've got problems reading the assembly code ecx = i; ebx = sum.

But still your mileage may vary - who knows what heuristics are used for this particular case, so you'll have to compile the code you've got in mind and see how longer/more complicated computations influence the optimizer.

Though on any modern CPU the branch predictor will do pretty good on such easy code, so you won't see much performance losses in either case. What's the performance loss of maybe a handful mispredictions if your computation intense code needs billions of cycles?

Telfore answered 16/2, 2011 at 15:20 Comment(1)
Ahhh - thanks a lot! I never knew how to go about analyzing disassembled code.Limpkin

© 2022 - 2024 — McMap. All rights reserved.