How can GCC unroll a loop if its number of iterations is unknown at compile time?
Asked Answered
S

4

13

I was reading the optimization options for GCC when I found the option -funroll-all-loops.

Its description reads:

Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. '-funroll-all-loops' implies the same options as '-funroll-loops'

How can the compiler unroll a loop if its number of iterations is unknown at compile time? Doesn't the compiler need this information to unroll it? What corresponding C code does it generate, and in what contexts could this be useful if it usually makes the programs run more slowly?

Should answered 1/7, 2015 at 16:37 Comment(4)
@Olaf how is this not a specific question? OP's clearly asking how gcc (a specific C compiler) is able to know how to unroll a loop if its bounds are unknown.Ethiopia
@Olaf How is it too broad if I want to know what a specific compiler option does?Should
@paulsm4 I don't think so. If the compiler knows the bounds already, then the decision comes down to how much to actually unroll it. This question isn't asking that.Ethiopia
You may find the discussion about Duff's Device in my answer to Switch case weird scoping helpful. (Other posters are mentioning Duff's Device, too.)Sarcenet
B
3

in what contexts could this be useful if it usually makes the programs run more slowly?

Well they are assuming if you choose this option you know what you are doing, if you don't you should not use this option.

what is gcc going to do, well I used this sample program:

#include <stdio.h>

void f(int j )
{
  for( int k = 0; k < j; ++k )
  {
    printf( "%d\n", k ) ;
  }
}

and tested it with godbolt and it generates a jump table based on the number of iterations left (see it live):

cmpl    $1, %ebp
movl    $1, %ebx
je  .L1
testl   %r12d, %r12d
je  .L27
cmpl    $1, %r12d
je  .L28
cmpl    $2, %r12d
je  .L29
cmpl    $3, %r12d
je  .L30
cmpl    $4, %r12d
je  .L31
cmpl    $5, %r12d
je  .L32
cmpl    $6, %r12d
je  .L33
Bestow answered 1/7, 2015 at 16:52 Comment(1)
Of course, with a printf in there, the benefit is basically zero ;-)Poseur
B
4

Here's some C code showing how to do it:

int iterations = 100;
int unrollValue = 8;

while (iterations%unrollvalue)
{
   // insert loop code here
   iterations--;
}

while (iterations)
{
   // insert unrollValue copies of loop code here
   iterations-= unrollValue;
}

The compiler will replace the first loop with a relative jump, but that's not easy to represent in C. Note that unrolling by a power of 2 allows the compiler to use a mask instead of the (expensive) divide operation.

Behindhand answered 1/7, 2015 at 16:47 Comment(0)
B
3

in what contexts could this be useful if it usually makes the programs run more slowly?

Well they are assuming if you choose this option you know what you are doing, if you don't you should not use this option.

what is gcc going to do, well I used this sample program:

#include <stdio.h>

void f(int j )
{
  for( int k = 0; k < j; ++k )
  {
    printf( "%d\n", k ) ;
  }
}

and tested it with godbolt and it generates a jump table based on the number of iterations left (see it live):

cmpl    $1, %ebp
movl    $1, %ebx
je  .L1
testl   %r12d, %r12d
je  .L27
cmpl    $1, %r12d
je  .L28
cmpl    $2, %r12d
je  .L29
cmpl    $3, %r12d
je  .L30
cmpl    $4, %r12d
je  .L31
cmpl    $5, %r12d
je  .L32
cmpl    $6, %r12d
je  .L33
Bestow answered 1/7, 2015 at 16:52 Comment(1)
Of course, with a printf in there, the benefit is basically zero ;-)Poseur
P
2

It can do something like:

while(n >= 8){
  foo(); foo(); foo(); foo(); foo(); foo(); foo(); foo(); 
  n -= 8;
}
while(n > 0){
  foo();
  n--;
}

Of course, Duff's Device would save having to write the second loop.

Why do it? That's up to the user. If foo() spends more than a few cycles, or if the original loop takes less that 5% of overall wall-clock time, or if n is typically small, it's probably not worth the trouble.

Poseur answered 1/7, 2015 at 16:43 Comment(0)
I
0

You can't assume that there's such a thing as corresponding C code for a compiler's intermediate representation. But in this case, I'd expect the nearest equivalent to be something like Duff's Device, which is a sequence (usually in a loop) that can be entered at a computed location.

Idaidae answered 1/7, 2015 at 16:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.