Which, if any, C++ compilers do tail-recursion optimization?
Asked Answered
L

5

172

It seems to me that it would work perfectly well to do tail-recursion optimization in both C and C++, yet while debugging I never seem to see a frame stack that indicates this optimization. That is kind of good, because the stack tells me how deep the recursion is. However, the optimization would be kind of nice as well.

Do any C++ compilers do this optimization? Why? Why not?

How do I go about telling the compiler to do it?

  • For MSVC: /O2 or /Ox
  • For GCC: -O2 or -O3

How about checking if the compiler has done this in a certain case?

  • For MSVC, enable PDB output to be able to trace the code, then inspect the code
  • For GCC..?

I'd still take suggestions for how to determine if a certain function is optimized like this by the compiler (even though I find it reassuring that Konrad tells me to assume it)

It is always possible to check if the compiler does this at all by making an infinite recursion and checking if it results in an infinite loop or a stack overflow (I did this with GCC and found out that -O2 is sufficient), but I want to be able to check a certain function that I know will terminate anyway. I'd love to have an easy way of checking this :)


After some testing, I discovered that destructors ruin the possibility of making this optimization. It can sometimes be worth it to change the scoping of certain variables and temporaries to make sure they go out of scope before the return-statement starts.

If any destructor needs to be run after the tail-call, the tail-call optimization can not be done.

Lipase answered 29/8, 2008 at 7:35 Comment(0)
E
152

All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade), even for mutually recursive calls such as:

int bar(int, int);

int foo(int n, int acc) {
    return (n == 0) ? acc : bar(n - 1, acc + 2);
}

int bar(int n, int acc) {
    return (n == 0) ? acc : foo(n - 1, acc + 1);
}

Letting the compiler do the optimisation is straightforward: Just switch on optimisation for speed:

  • For MSVC, use /O2 or /Ox.
  • For GCC, Clang and ICC, use -O3

An easy way to check if the compiler did the optimisation is to perform a call that would otherwise result in a stack overflow — or looking at the assembly output.

As an interesting historical note, tail call optimisation for C was added to the GCC in the course of a diploma thesis by Mark Probst. The thesis describes some interesting caveats in the implementation. It's worth reading.

Eroto answered 29/8, 2008 at 7:40 Comment(4)
ICC would do so, I believe. To the best of my knowledge, ICC produces the fastest code on the market.Apothecium
@Paul The question is how much of the speed of ICC code is caused by algorithmic optimizations such as tail call optimizations and how much is caused by the cache and microinstruction optimizations that only Intel, with their intimate knowledge of their own processors, can do.Eurystheus
gcc has more narrow option -foptimize-sibling-calls to "optimize sibling and tail recursive calls". This option (according to gcc(1) manual pages for versions 4.4, 4.7 and 4.8 targeting various platforms) is enabled at levels -O2, -O3, -Os.Codd
Also, running in DEBUG mode without explicitly requesting optimizations will NOT do ANY optimization at all. You can enable PDB for true release mode EXE and try stepping through that, but note that debugging in Release mode has its complications - invisible / stripped variables, merged variables, variables becoming out of scope in unknown / unexpected scope, variables thet never go in scope and became true constants with stack-level addresses, and - well - Merged or missing stack frames. Usually merged stack frames means callee is inlined, and missing/backmerged frames probably tail call.Folberth
S
26

As well as the obvious (compilers don't do this sort of optimization unless you ask for it), there is a complexity about tail-call optimization in C++: destructors.

Given something like:

   int fn(int j, int i)
   {
      if (i <= 0) return j;
      Funky cls(j,i);
      return fn(j, i-1);
   }

The compiler can't (in general) tail-call optimize this because it needs to call the destructor of cls after the recursive call returns.

Sometimes the compiler can see that the destructor has no externally visible side effects (so it can be done early), but often it can't.

A particularly common form of this is where Funky is actually a std::vector or similar.

Springfield answered 26/2, 2016 at 10:28 Comment(0)
E
23

gcc 4.3.2 completely inlines this function (crappy/trivial atoi() implementation) into main(). Optimization level is -O1. I notice if I play around with it (even changing it from static to extern, the tail recursion goes away pretty fast, so I wouldn't depend on it for program correctness.

#include <stdio.h>
static int atoi(const char *str, int n)
{
    if (str == 0 || *str == 0)
        return n;
    return atoi(str+1, n*10 + *str-'0');
}
int main(int argc, char **argv)
{
    for (int i = 1; i != argc; ++i)
        printf("%s -> %d\n", argv[i], atoi(argv[i], 0));
    return 0;
}
Endothecium answered 21/10, 2008 at 3:13 Comment(2)
You can activate link-time optimization though and I guess that even an extern method might be inlined then.Eroto
Strange. I just tested gcc 4.2.3 (x86, Slackware 12.1) and gcc 4.6.2 (AMD64, Debian wheezy) and with -O1 there is no inlining and no tail-recursion optimization. You have to use -O2 for that (well, in 4.2.x, which is rather ancient now, it still won't be inlined). BTW It's also worth to add that gcc can optimize recursion even when it is not strictly a tail one (like factorial w/o accumulator).Aldis
O
13

Most compilers don't do any kind of optimisation in a debug build.

If using VC, try a release build with PDB info turned on - this will let you trace through the optimised app and you should hopefully see what you want then. Note, however, that debugging and tracing an optimised build will jump you around all over the place, and often you cannot inspect variables directly as they only ever end up in registers or get optimised away entirely. It's an "interesting" experience...

Ong answered 29/8, 2008 at 7:48 Comment(2)
try gcc why -g -O3 and to get opimizations in a debug build. xlC has the same behavior.Gens
When you say "most compilers": what collections of compilers do you consider? As pointed out there are at least two compilers that performs optimizations during debug build - and as far as I know VC does that too (except if you're enabling modify-and-continue perhaps).Tautologism
H
10

As Greg mentions, compilers won't do it in debug mode. It's ok for debug builds to be slower than a prod build, but they shouldn't crash more often: and if you depend on a tail call optimization, they may do exactly that. Because of this it is often best to rewrite the tail call as an normal loop. :-(

Homotaxis answered 16/9, 2008 at 10:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.