How do I check if gcc is performing tail-recursion optimization?
Asked Answered
D

10

70

How do I tell if gcc (more specifically, g++) is optimizing tail recursion in a particular function? (Because it's come up a few times: I don't want to test if gcc can optimize tail recursion in general. I want to know if it optimizes my tail recursive function.)

If your answer is "look at the generated assembler", I'd like to know exactly what I'm looking for, and whether or not I could write a simple program that examines the assembler to see if there's optimization.

PS. I know this appears as part of the question Which, if any, C++ compilers do tail-recursion optimization? from 5 months ago. However, I don't think this part of that question was answered satisfactorily. (The answer there was "The easiest way to check if the compiler did the optimization (that I know of) is perform a call that would otherwise result in a stack overflow – or looking at the assembly output.")

Dignadignified answered 29/1, 2009 at 2:42 Comment(5)
@GeoffreyF67: It's "essentially" when the last statement in a function f is "return f(..)", i.e. it calls itself with different parameters. See en.wikipedia.org/wiki/Tail_recursion or https://mcmap.net/q/16803/-what-is-tail-recursionDignadignified
Thanks A.Rex - that's very helpful. Fascinating too! I didn't know that was possible :)Montserrat
@[A. Rex]: did you try backtrace()?Cordiecordier
@Steven A. Lowe: sorry, not yet. I'll try it now.Dignadignified
How to detect any optimization: stackoverflow.com/questions/15226668/… || stackoverflow.com/questions/2016781/…Lymphoid
R
71

Let's use the example code from the other question. Compile it, but tell gcc not to assemble:

gcc -std=c99 -S -O2 test.c

Now let's look at the _atoi function in the resultant test.s file (gcc 4.0.1 on Mac OS 10.5):

        .text
        .align 4,0x90
_atoi:
        pushl   %ebp
        testl   %eax, %eax
        movl    %esp, %ebp
        movl    %eax, %ecx
        je      L3
        .align 4,0x90
L5:
        movzbl  (%ecx), %eax
        testb   %al, %al
        je      L3
        leal    (%edx,%edx,4), %edx
        movsbl  %al,%eax
        incl    %ecx
        leal    -48(%eax,%edx,2), %edx
        jne     L5
        .align 4,0x90
L3:
        leave
        movl    %edx, %eax
        ret

The compiler has performed tail-call optimization on this function. We can tell because there is no call instruction in that code whereas the original C code clearly had a function call. Furthermore, we can see the jne L5 instruction, which jumps backward in the function, indicating a loop when there was clearly no loop in the C code. If you recompile with optimization turned off, you'll see a line that says call _atoi, and you also won't see any backward jumps.

Whether you can automate this is another matter. The specifics of the assembler code will depend on the code you're compiling.

You could discover it programmatically, I think. Make the function print out the current value of the stack pointer (register ESP on x86). If the function prints the same value for the first call as it does for the recursive call, then the compiler has performed the tail-call optimization. This idea requires modifying the function you hope to observe, though, and that might affect how the compiler chooses to optimize the function. If the test succeeds (prints the same ESP value both times), then I think it's reasonable to assume that the optimization would also be performed without your instrumentation, but if the test fails, we won't know whether the failure was due to the addition of the instrumentation code.

Reprieve answered 29/1, 2009 at 3:29 Comment(0)
J
17

EDIT My original post also prevented GCC from actually doing tail call eliminations. I've added some additional trickiness below that fools GCC into doing tail call elimination anyways.

Expanding on Steven's answer, you can programmatically check to see if you have the same stack frame:

#include <stdio.h>

// We need to get a reference to the stack without spooking GCC into turning
// off tail-call elimination
int oracle2(void) { 
    char oracle; int oracle2 = (int)&oracle; return oracle2; 
}

void myCoolFunction(params, ..., int tailRecursionCheck) {
    int oracle = oracle2();
    if( tailRecursionCheck && tailRecursionCheck != oracle ) {
        printf("GCC did not optimize this call.\n");
    }
    // ... more code ...
    // The return is significant... GCC won't eliminate the call otherwise
    return myCoolFunction( ..., oracle);
}

int main(int argc, char *argv[]) {
    myCoolFunction(..., 0);
    return 0;
}

When calling the function non-recursively, pass in 0 the check parameter. Otherwise pass in oracle. If a tail recursive call that should've been eliminated was not, then you'll be informed at runtime.

When testing this out, it looks like my version of GCC does not optimize the first tail call, but the remaining tail calls are optimized. Interesting.

Jefferson answered 29/1, 2009 at 5:7 Comment(5)
+1 for being clever, and not having to use the GCC internals functionsCordiecordier
Cute. This also worked for me, though it did involve modifying the code. Incidentally, my function is also void and does "call_myself(); return;" instead of "return call_myself();". As far as I can tell, gcc is optimizing both away now, but wasn't a few years ago.Dignadignified
Huh. And here I was thinking that there wasn't any portable way to look at your own stack... of course there (kind of) is. +1Nabal
Great answer. It would be better to use void* instead of int because the latter is not guaranteed to be large enough to store a pointer.Misgiving
It is not guaranteed to work. In particular, this doesn't work with g++-4.9.2 on Ubuntu, 14.04, x64.Allegorist
L
8

Look at the generated assembly code and see if it uses a call or jmp instruction for the recursive call on x86 (for other architectures, look up the corresponding instructions). You can use nm and objdump to get just the assembly corresponding to your function. Consider the following function:

int fact(int n)
{
  return n <= 1 ? 1 : n * fact(n-1);
}

Compile as

gcc fact.c -c -o fact.o -O2

Then, to test if it's using tail recursion:

# get starting address and size of function fact from nm
ADDR=$(nm --print-size --radix=d fact.o | grep ' fact$' | cut -d ' ' -f 1,2)
# strip leading 0's to avoid being interpreted by objdump as octal addresses
STARTADDR=$(echo $ADDR | cut -d ' ' -f 1 | sed 's/^0*\(.\)/\1/')
SIZE=$(echo $ADDR | cut -d ' ' -f 2 | sed 's/^0*//')
STOPADDR=$(( $STARTADDR + $SIZE ))

# now disassemble the function and look for an instruction of the form
# call addr <fact+offset>
if objdump --disassemble fact.o --start-address=$STARTADDR --stop-address=$STOPADDR | \
    grep -qE 'call +[0-9a-f]+ <fact\+'
then
    echo "fact is NOT tail recursive"
else
    echo "fact is tail recursive"
fi

When ran on the above function, this script prints "fact is tail recursive". When instead compiled with -O3 instead of -O2, this curiously prints "fact is NOT tail recursive".

Note that this might yield false negatives, as ehemient pointed out in his comment. This script will only yield the right answer if the function contains no recursive calls to itself at all, and it also doesn't detect sibling recursion (e.g. where A() calls B() which calls A()). I can't think of a more robust method at the moment that doesn't involve having a human look at the generated assembly, but at least you can use this script to easily grab the assembly corresponding to a particular function within an object file.

Lyndel answered 29/1, 2009 at 3:27 Comment(5)
Naïve factorial and fibonacci functions are not good examples. Compilers are often subjected to dumb benchmarks with them, so compiler writers make sure to add special optimizations that apply specifically to functions that look like fact/fib.Nabal
See my answer for an example of when your detector will report "NOT tail recursive" for a tail-call-optimized function.Nabal
@Adam Rosenfield: Your solution worked for me with a few minor changes. For example, I'm using C++, and my function is a member function, so the function name becomes something mangled like "_ZN5cycle4stepEib".Dignadignified
@Adam Rosenfield: I find it interesting that it makes that call tail-recursive at all with -O2. Meanwhile, with -O3 it "unrolls the loop" some 10 times, and then actually calls itself.Dignadignified
Just found this question and couldn't help but point out that the code isn't even tail recursive in nature, since the last operation in the function is actually multiplication, not the recursive call to fact.Bonnet
N
6

Expanding on PolyThinker's answer, here's a concrete example.

int foo(int a, int b) {
    if (a && b)
        return foo(a - 1, b - 1);
    return a + b;
}

i686-pc-linux-gnu-gcc-4.3.2 -Os -fno-optimize-sibling-calls output:

00000000 <foo>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 55 08                mov    0x8(%ebp),%edx
   6:   8b 45 0c                mov    0xc(%ebp),%eax
   9:   85 d2                   test   %edx,%edx
   b:   74 16                   je     23 <foo+0x23>
   d:   85 c0                   test   %eax,%eax
   f:   74 12                   je     23 <foo+0x23>
  11:   51                      push   %ecx
  12:   48                      dec    %eax
  13:   51                      push   %ecx
  14:   50                      push   %eax
  15:   8d 42 ff                lea    -0x1(%edx),%eax
  18:   50                      push   %eax
  19:   e8 fc ff ff ff          call   1a <foo+0x1a>
  1e:   83 c4 10                add    $0x10,%esp
  21:   eb 02                   jmp    25 <foo+0x25>
  23:   01 d0                   add    %edx,%eax
  25:   c9                      leave
  26:   c3                      ret

i686-pc-linux-gnu-gcc-4.3.2 -Os output:

00000000 <foo>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 55 08                mov    0x8(%ebp),%edx
   6:   8b 45 0c                mov    0xc(%ebp),%eax
   9:   85 d2                   test   %edx,%edx
   b:   74 08                   je     15 <foo+0x15>
   d:   85 c0                   test   %eax,%eax
   f:   74 04                   je     15 <foo+0x15>
  11:   48                      dec    %eax
  12:   4a                      dec    %edx
  13:   eb f4                   jmp    9 <foo+0x9>
  15:   5d                      pop    %ebp
  16:   01 d0                   add    %edx,%eax
  18:   c3                      ret

In the first case, <foo+0x11>-<foo+0x1d> pushes arguments for a function call, while in the second case, <foo+0x11>-<foo+0x14> modifies the variables and jmps to the same function, somewhere after the preamble. That's what you want to look for.

I don't think you can do this programatically; there's too much possible variation. The "meat" of the function may be closer to or further away from the start, and you can't distinguish that jmp from a loop or conditional without looking at it. It might be a conditional jump instead of a jmp. gcc might leave a call in for some cases but apply sibling call optimization to other cases.

FYI, gcc's "sibling calls" is slightly more general than tail-recursive calls -- effectively, any function call where re-using the same stack frame is okay is potentially a sibling call.

[edit]

As an example of when just looking for a self-recursive call will mislead you,

int bar(int n) {
    if (n == 0)
        return bar(bar(1));
    if (n % 2)
        return n;
    return bar(n / 2);
}

GCC will apply sibling call optimization to two out of the three bar calls. I'd still call it tail-call-optimized, since that single unoptimized call never goes further than a single level, even though you'll find a call <bar+..> in the generated assembly.

Nabal answered 29/1, 2009 at 3:28 Comment(3)
Thanks. Re your example: I wonder if any compiler is capable enough to optimize away the first return into "return 1" ... it seems like all it has to do is constant fold "bar(1)" into "1". (I understand that isn't the point of your example; I was just wondering.)Dignadignified
If you tag bar() with the 'const' attribute (see gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html), then gcc will fold bar(bar(1)) into 1.Lyndel
@Adam Rosenfield: Cool. Do you know why gcc isn't smart enough to figure that out itself?Dignadignified
C
3

i am way too lazy to look at a disassembly. Try this:

void so(long l)
{
    ++l;
    so(l);
}
int main(int argc, char ** argv)
{
    so(0);
    return 0;
}

compile and run this program. If it runs forever, the tail-recursion was optimized away. if it blows the stack, it wasn't.

EDIT: sorry, read too quickly, the OP wants to know if his particular function has its tail-recursion optimized away. OK...

...the principle is still the same - if the tail-recursion is being optimized away, then the stack frame will remain the same. You should be able to use the backtrace function to capture the stack frames from within your function, and determine if they are growing or not. If tail recursion is being optimized away, you will have only one return pointer in the buffer.

Cordiecordier answered 29/1, 2009 at 3:37 Comment(9)
The OP wants to test if a specific function is being optimized. This trivial example is useless to that end, since this is not his function. His function (hopefully) terminates for most/all inputs.Lyndel
dear drive-by downvoter: how was this not helpful? did you try it?Cordiecordier
@[Adam Rosenfield]: the OP's question does not state this!Cordiecordier
@[Adam Rosenfield]: oh wait, i see the "in a particular function" italics now. doh! will edit...Cordiecordier
(I did not down-vote this.) @Adam Rosenfield is right; I wanted to test a particular function in my code, not whether gcc optimizes in general.Dignadignified
Yes it does -- reread the very first sentence in which the words "particular function" are italicized.Lyndel
@[A. Rex] @[Adam Rosenfield] @[random down-voters] see edits; use backtrace(). Apologies for the confusion.Cordiecordier
@Steven A. Lowe: I tried this. It worked, but it was slightly non-ideal for my situation, mostly because it required modifying my code. Also, it was slightly difficult to analyze the output because my function sometimes returns quickly and is often called at different starting stack depths.Dignadignified
@[A. Rex]: call backtrace() before you first call your function, and store the buffer somewhere, then call backtrace() within your function and see what's different - if tail recursion is optimized away, the difference will be exactly one pointerCordiecordier
L
2

A simple method: Build a simple tail recursion program, compile it, and dissemble it to see if it is optimized.

Just realized that you already had that in your question. If you know how to read assembly, it's quite easy to tell. Recursive functions will call themselves (with "call label") from within the function body, and a loop will be just "jmp label".

Lenlena answered 29/1, 2009 at 2:44 Comment(2)
The first part of your answer is not at all what I want. For the second part, you're saying all I do is check whether the function does "call label" or "jmp label"?Dignadignified
Tail recursion optimization on x86 is almost exactly the replacement of a call with a jump. Call generates a new stack frame, but jmp doesn't. If the disassembled function doesn't have a call right before the return, it's been optimized.Chiaki
P
2

Another way I checked this is:

  1. Compile your code with 'gcc -O2'
  2. start 'gdb'
  3. Place a breakpoint in the function you are expecting to be tail-recursion optimized/eliminated
  4. run your code
  5. If it has been tail call eliminated, then the breakpoint will be hit only once or never. For more on this see this
Pinnacle answered 23/5, 2009 at 4:17 Comment(0)
F
1

These days you could check the DWARF for DW_AT_call_tail_call and DW_AT_call_all_calls attributes.

gcc -g -O2 foo.c
dwarfdump a.out

You will get a DW_TAG_call_site where the tail call happened, something like this:

< 2><0x00000174> DW_TAG_call_site                                                                                                                                                                                                                        
                    DW_AT_call_return_pc        0x000011cc                                                                                                                                                                                                
                    DW_AT_call_tail_call        yes(1)                                                                                                                                                                                                    
                    DW_AT_call_origin           <0x000001a6>                                                                                                                                                                                              
                    DW_AT_sibling               <0x00000192>   

The DWARF5 standard has the details for this.

Fingerbreadth answered 27/4, 2023 at 8:48 Comment(0)
G
0

You could craft input data that would lead to stack overflow because of too deep recursion of that function calls if there were no optimization and see if it happens. Of course, this is not trivial and sometimes big enough inputs will make the function run for intolerably long period of time.

Glandule answered 19/5, 2011 at 6:24 Comment(0)
S
0

We can use objdump for this. Here is an example. The code computes n-th triangular number = 1 + 2 + ... + n :

// File tri.c

// See "-O1" vs "-O1 -foptimize-sibling-calls" to see the tail recursion optimization

#include <stdio.h>

int tri(int n, int result)
{
    if (0 == n)
    {
        return result;
    }

    result += n;
    
    --n;

    return tri(n, result);
}

int main()
{
    printf("%d\n", tri(36, 0)); // Prints 666 for the 36th triangular number.
    return 0;
}

For gcc 11.3.1 with -O1 flag you get NO tail recursion optimization:

; gcc tri.c -O1
; objdump -d a.out

...
0000000000401126 <tri>:
  401126:   89 f0                   mov    %esi,%eax
  401128:   85 ff                   test   %edi,%edi
  40112a:   75 01                   jne    40112d <tri+0x7>
  40112c:   c3                      ret    
  40112d:   48 83 ec 08             sub    $0x8,%rsp
  401131:   8d 34 37                lea    (%rdi,%rsi,1),%esi
  401134:   83 ef 01                sub    $0x1,%edi
  401137:   e8 ea ff ff ff          call   401126 <tri>         <--- The recursion call
  40113c:   48 83 c4 08             add    $0x8,%rsp
  401140:   c3                      ret  
...

For gcc 11.3.1 with -O1 gcc tri.c -O1 -foptimize-sibling-calls flags you GET the tail recursion optimization:

; gcc tri.c -O1 -foptimize-sibling-calls
; objdump -d a.out

...
0000000000401126 <tri>:
  401126:   89 f0                   mov    %esi,%eax
  401128:   85 ff                   test   %edi,%edi
  40112a:   74 07                   je     401133 <tri+0xd>
  40112c:   01 f8                   add    %edi,%eax
  40112e:   83 ef 01                sub    $0x1,%edi
  401131:   75 f9                   jne    40112c <tri+0x6>
  401133:   c3                      ret  
...
Sarisarid answered 2/4, 2023 at 9:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.