Why doesn't Haskell need Trampolining?

Asked 22/4, 2021 at 0:56 Answered 28/4, 2021 at 19:46

scala haskell recursion tail-recursion trampolines

As Scala developer learning the IO Monad, and therefore technicalities of Trampolining in general that are necessary for recursion where tail call optimization is not possible, I wonder how Haskell seems to natively avoid it.

I get that Haskell is a lazy language, however I wonder if someone could elaborate a bit further.

For instance, why doesn't ForeverM stackoverflow in scala? Well, I can answer for trampoline, and I can find the actual code that does that in libraries and blogs. I actually implemented a basic trampoline myself to learn.

How does it happens in Haskell? Is there a way to unpack the laziness a bit, give some pointers, and maybe documentation that would help in understanding it better?

sealed trait IO[A] {

.....


  def flatMap[B](f: A => IO[B]): IO[B] =
    FlatMap[A,B](this, f) // we do not interpret the `flatMap` here, just return it as a value
  def map[B](f: A => B): IO[B] =
    flatMap[B](f andThen (Return(_)))

}
case class Return[A](a: A) extends IO[A]
case class Suspend[A](resume: () => A) extends IO[A]
case class FlatMap[A,B](sub: IO[A], k: A => IO[B]) extends IO[B]

......

@annotation.tailrec
def run[A](io: IO[A]): A = io match {
  case Return(a) => a
  case Suspend(r) => r()
  case FlatMap(x, f) => x match {
    case Return(a) => run(f (a))
    case Suspend(r) => run(f( r()))
    case FlatMap(y, g) => run(y flatMap (a => g(a) flatMap f))
  }
}

Pertinacious answered 22/4, 2021 at 0:56 Comment(1)

"we do not interpret the flatMap here, just return it as a value" is basically what happens in Haskell all the time, without being explicit about it - everything is just a lazy thunk, and it's getting evaluated by one huge trampoline that executes the whole program. – Meteorograph 22/4, 2021 at 2:16

Functional programming in general requires tail-call elimination (otherwise the deep chains of function calls overflow the stack). For example, consider this (absurdly inefficient) implementation of an even/odd classifier:

def even(i: Int): Boolean =
  if (i == 0) true
  else if (i > 0) odd(i - 1)
  else odd(i + 1)

def odd(i: Int): Boolean =
  if (i == 0) false
  else if (i > 0) even(i - 1)
  else even(i + 1)

In both even and odd, every branch is either a simple expression (true or false in this case) which doesn't make a function call or a tail-call: the value of the called function is returned without being operated on.

Without tail-call elimination, the (potentially recursive with an indefinite length of a cycle) calls have to be implemented using a stack which consumes memory, because the caller may do something with the result. Tail-call elimination relies on observing that the caller doesn't do anything with the result, therefore the called function can effectively replace the caller on the stack.

Haskell and essentially every other post-Scheme functional language runtime implements generalized tail-call elimination: tail-calls become an unconditional jump (think a GOTO). The famous series of Steele and Sussman papers (the PDFs unfortunately didn't get archived, but you can search for, e.g. AIM-443 (mit or steele or sussman might be required)) known as "Lambda: The Ultimate" (which inspired the name of a programming language forum) goes through the implications of tail-call elimination and how this means that functional programming is actually viable for solving real-world computing problems.

Scala, however, primarily targets the Java Virtual Machine, the specification of which effectively (by design) prohibits generalized tail-call elimination, and the instruction set of which constrains unconditional jumps to not cross the boundaries of a method. In certain limited contexts (basically recursive calls of a method where the compiler can be absolutely sure of what implementation is being called), the Scala compiler performs the tail-call elimination before emitting the Java bytecode (it's theoretically conceivable that Scala Native could perform generalized tail-call elimination, but that would entail some semantic break with JVM and JS Scala (some JavaScript runtimes perform generalized tail-call elimination, though not V8 to my knowledge)). The @tailrec annotation, with which you may have some familiarity, enforces a requirement that the compiler be able to perform tail-call elimination.

Trampolining is a low-level technique at runtime for emulating compile-time tail-call elimination, especially in languages like C or Scala. Since Haskell has performed the tail-call elimination at compile-time, there's thus no need for the complexity of a trampoline (and the requirement to write the high-level code into continuation-passing style).

You can arguably think of the CPU in a Haskell program (or the runtime itself if transpiling to, e.g. JS) as implementing a trampoline.

Currin answered 22/4, 2021 at 2:16 Comment(18)

It should be noted that laziness doesn't come into it. Scheme is a strict language like Scala, and any compliant Scheme implementation is required to eliminate all tail-calls. – Currin 22/4, 2021 at 2:21

It's possible, by the way to implement generalized tail-call elimination on the JVM: you just compile your program such that it's entirely within a single method and live with the implications around code-size and total lack of Java interop. – Currin 22/4, 2021 at 2:24

I'm vaguely proud that I managed to answer this without dropping into assembly... it's like the first time ever. – Currin 22/4, 2021 at 2:26

And I should also note that Scheme did not invent tail-call elimination: it was an optimization trick that had been in use among assembly-language coders since the subroutine and recursive subroutine design patterns became common in the 1950s. Earlier lisp and other compilers had performed tail-call elimination in some cases for years before that. Steele and Sussman demonstrated that it was generally applicable. – Currin 22/4, 2021 at 2:32

dspace.mit.edu/bitstream/handle/1721.1/5753/AIM-443.pdf is probably the clearest original explication of tail-call elimination: "In general, procedure calls may be usefully thought of as GOTO statements which also pass parameters, and can uniformly be encoded as JUMP instructions" – Currin 22/4, 2021 at 2:40

Thank you for the detailed answer on this including the reference. I will need to read into that reference and get back to you. – Pertinacious 22/4, 2021 at 2:49

In the mean time does @Meteorograph comment above falls into what you call the generalized tail call elimination ? Or you disagree with his statement ? Just trying to reconcile the two inputs so far before I dig into the paper ... – Pertinacious 22/4, 2021 at 2:52

I would have argue that this “thunk” thingy is what makes it possible ultimately, but that is property of laziness which you clearly state is not related to the problem ... – Pertinacious 22/4, 2021 at 3:9

Bergi is accurate: the Haskell compiler does effectively do the transformation to trampolined code (by way of generalized tail-call elimination) for you. – Currin 22/4, 2021 at 3:10

All a thunk is is a function of zero arguments: () => ??? is a thunk (expressed in Scala) – Currin 22/4, 2021 at 3:11

As mentioned, Scheme performs generalized tail call elimination and it's not lazy (you can implement laziness, just as you can in Scala). – Currin 22/4, 2021 at 3:12

Unfortunately there's at least one mainstream functional programming language implementation which doesn't perform TCO: GNU R. Granted, that isn't its only shortcoming. – Piles 22/4, 2021 at 13:18

@LeviRamsey, I think it's better to say that a thunk is a closure which can be run with zero arguments. It is not (usually) a function, though it's possible to make those too: Just (if x then id else succ) represents a constructor that contains a thunk that can be run to produce a function. – Diuretic 22/4, 2021 at 15:17

@LeviRamsey "the Haskell compiler does effectively do the transformation to trampolined code (by way of generalized tail-call elimination) for you." this mixes two completely separate though related things. either a function f returns to a trampoline loop with an indication for it to perform a call to g, or it jumps into g directly. both avoid growing the stack, but these are two very different things. – Saponaceous 22/4, 2021 at 16:19

That's why I say "effectively" as in, it accomplishes the intent of trampoline by way of doing tail-call elimination. – Currin 22/4, 2021 at 18:36

@LeviRamsey: I believe that Haskell's native code generator replaces tail calls with assembly jump instructions - hence no need for trampolining. If I understand the question correctly, it seems that the OP was under the impression that trampolining was the only possible implementation of tail recursion (it is not). – Cohla 16/7, 2021 at 17:27

@AlexM. I'm not sure why you commented: as I specifically call out "Haskell... implements generalized tail-call elimination: tail-calls become an unconditional jump" and thus "Since Haskell has performed the tail-call elimination at compile-time, there's thus no need for the complexity of a trampoline". – Currin 16/7, 2021 at 17:37

@LeviRamsey: I know that you did, but that useful information seemed to me buried in an ocean of stuff that could disconcert the beginner reader. I also wanted to ping the OP, not you, in my comment. :) – Cohla 16/7, 2021 at 17:42

Trampolining is not the only solution for tail calls. Scala requires trampolining precisely because it runs on the JVM, with the Java runtime. The Scala language developers did not get to choose precisely how their runtime operates, nor their binary format. Because they use the JVM, they must endure every way that the JVM is optimized for Java and not for Scala.

Haskell does not have this limitation, because it has its own runtime, it's own binary format, etc. It can choose precisely how to set up the stack at runtime based on language-level constructs of the Haskell language --- not, of the Java one.

Mollescent answered 28/4, 2021 at 19:46 Comment(0)

Recommended topics

Hot tags