Why can't tail calls be optimized in JVM-based Lisps?

Asked 19/10, 2013 at 4:17 Answered 10/7, 2015 at 14:44

Solved clojure jvm lisp jvm-languages abcl

Main question: I view the most significant application of tail call optimization (TCO) as a translation of a recursive call into a loop (in cases in which the recursive call has a certain form). More precisely, when translated into a machine language, this would usually be translation into some sort of series of jumps. Some Common Lisp and Scheme compilers that compile to native code (e.g. SBCL) can identify tail-recursive code and perform this translation. JVM-based Lisps such as Clojure and ABCL have trouble doing this. What is it about the JVM as a machine that prevents or makes this difficult? I don't get it. The JVM obviously has no problem with loops. It's the compiler that has to figure out how to do TCO, not the machine to which it compiles.

Related question: Clojure can translate seemingly recursive code into a loop: It acts as if it's performing TCO, if the programmer replaces the tail call to the function with the keyword recur. But if it's possible to get a compiler to identify tail calls--as SBCL and CCL do, for example--then why can't the Clojure compiler figure out that it's supposed to treat a tail call the way it treats recur?

(Sorry--this is undoubtably a FAQ, and I'm sure that the remarks above show my ignorance, but I was unsuccessful in finding earlier questions.)

Highwrought answered 19/10, 2013 at 4:17 Comment(4)

For Rich Hickey's comment on the subject, take a look at: groups.google.com/d/msg/clojure/4bSdsbperNE/tXdcmbiv4g0J – Grappa 19/10, 2013 at 4:30

Thanks @Jared314. The rest of that thread is helpful, too. I should have searched the Google group. – Highwrought 19/10, 2013 at 4:45

TCO works on all tail calls, not just recursive ones; and it doesn't work on all recursive calls, only such that are tail calls. It may be good if you were to edit your first sentence so it doesn't propagate misunderstandings about TCO. – Someway 19/10, 2013 at 16:14

OK, good point @Rörd. I've edited the first sentence. – Highwrought 19/10, 2013 at 18:9

Real TCO works for arbitrary calls in tail position, not just self calls, so that code like the following does not cause a stack overflow:

(letfn [(e? [x] (or (zero? x) (o? (dec x))))
        (o? [x] (e? (dec x)))]
  (e? 10))

Clearly you'd need JVM support for this, since programs running on the JVM cannot manipulate the call stack. (Unless you were willing to establish your own calling convention and impose the associated overhead on function calls; Clojure aims to use regular JVM method calls.)

As for eliminating self calls in tail position, that's a simpler problem which can be solved as long as the entire function body gets compiled to a single JVM method. That is a limiting promise to make, however. Besides, recur is fairly well liked for its explicitness.

Favianus answered 19/10, 2013 at 4:29 Comment(6)

Thanks Michal. Is the idea that TCO is difficult to do purely by analyzing source code (or some relatively simple transformation of of source) when the tail recursion goes through one or more other functions before coming back to the caller. That requires manipulating the call stack in some way? (The example blows the stack on odd numbers, by the way, but it still illustrates your point when invoked on an even number.) – Highwrought 19/10, 2013 at 4:40

I see from the Google clojure group thread that @Grappa cited (which includes your own helpful comments) that JVM gotos stay within methods, and methods are a fundamental structure of JVM code. I get it now. As long as a JVM-based language represents functions as methods, which is no doubt desirable, there would be no easy to way to turn a multi-function tail recursion into a loop. The goto can't go that far, so to speak. Real CPU machine languages have no such restriction, typically. – Highwrought 19/10, 2013 at 4:49

Also, take a look at this Conj 2012 presentation, and this repo, for efforts related to automatic TCO. – Grappa 19/10, 2013 at 5:0

@Highwrought The most efficient way of achieving real TCO is to replace the final stack frame with one appropriate for the tail call, but with the return address of the current call (taken from the frame being replaced; another way to view this is that the final frame gets partially rewritten, but the return address stays the same). So there is no global transformation into something resembling a single huge imperative loop; rather, the structure still looks like function calls, but chains of tail calls return directly to the place they were initiated at, with no intermediate returns. – Splanchnology 19/10, 2013 at 8:37

@Highwrought So, even a general goto is not quite enough, you also need to be able to manipulate the final frame on the call stack. In any case, structural restrictions on code executing on the JVM are indeed the issue. To get around them, one can resort to trampolining, at a performance penalty. See also Wikipedia's Tail call article for a good discussion. – Splanchnology 19/10, 2013 at 8:40

Thanks Michal. I think I understand now. Both about how TCO works and why it can't easily be done for all tail calls in the JVM. @Jared, thanks for the references to Chris Frisz's project. – Highwrought 20/10, 2013 at 4:30

There is a reason why the JVM does not support TCO: Why does the JVM still not support tail-call optimization?

However there is a way around this by abusing the heap memory and some trickery explained in the A First-Order One-Pass CPS Transformation paper; it is implemented in Clojure by Chris Frisz and Daniel P. Friedman (see clojure-tco).

Now Rich Hickey could have chosen to do such an optimization by default, Scala does this at some points. Instead he chose to rely on the end user to specify the cases where they can be optimized by Clojure with the trampoline or loop-recur constructs. The decision has been explained here: https://groups.google.com/d/msg/clojure/4bSdsbperNE/tXdcmbiv4g0J

Tabor answered 22/10, 2013 at 16:0 Comment(2)

It must be said that the cited "reasons" why JVM does not support TCO do not hold water. (Note JVM != Java, though it would certainly nice to have it there too.) – Stereotropism 23/10, 2013 at 9:45

"There is a reason why the JVM does not support TCO: Why does the JVM still not support tail-call optimization?" Then how does scala accomplish it? – Migratory 3/6, 2018 at 17:29

In the final presentation of ClojureConj 2014, Brian Goetz pointed out there is a security feature in the JVM that prevents stack frame collapsing (as that would be an attack vector for people looking to make a function go somewhere else on return).

https://www.youtube.com/watch?v=2y5Pv4yN0b0&index=21&list=PLZdCLR02grLoc322bYirANEso3mmzvCiI

Adamsen answered 10/7, 2015 at 14:44 Comment(0)

Recommended topics

Hot tags