Under what circumstances could Common Subexpression Elimination affect the laziness of a Haskell program?
Asked Answered
A

1

16

From wiki.haskell.org:

First of all, common subexpression elimination (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:

foo x = (bar x) * (bar x)

might be transformed into

foo x = let x' = bar x in x' * x'

thus, the bar function is only called once. (And if bar is a particularly expensive function, this might save quite a lot of work.) GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/laziness of the program. So GHC does do CSE, but only in specific circumstances --- see the GHC manual. (Section??)

Long story short: "If you care about CSE, do it by hand."

I'm wondering under what circumstances CSE "affects" the strictness/laziness of the program and what kind of effect that could be.

Archduchy answered 19/11, 2016 at 15:14 Comment(4)
That is explained more clearly in this GHC FAQ. The source for the answer there is this message by SPJ.Bowker
e.g. with y g = g x where x = g x we intend g to be actually invoked twice, but with CSE the g x common sub-expression might get recognized, and y g = x where x = g x will instead be in effect, with only one invocation of g. So the problem arises when holding on to g's results is expensive, e.g. when it's a very large list; it may be cheaper overall to recalculate it so it can be garbage-collected. so, like the answer says, space leaks.Allodium
(especially if the separate invocations of g are forced on different schedules, and the inner one would produce far less elements of its result than the outer one, which is thus imperative to be garbage-collected asap). note that for this y (better called y2 actually, the true y is y g = g (y g)) even the "better" CSE from the answer would still cause the same problem.Allodium
I think it's sloppy wording in the wiki. CSE affects the space usage of the program. I don't see why it would ever affect the strictness.Cutthroat
R
16

The naive CSE rule would be

e'[e, e]  ~>  let x = e in e'[x, x].

That is, whenever a subexpression e occurs twice in the expression e', we use a let-binding to compute e once. This however leads itself to some trivial space leaks. For example

sum [1..n] + prod [1..n]

is typically O(1) space usage in a lazy functional programming language like Haskell (as sum and prod would tail-recurse and blah blah blah), but would become O(n) when the naive CSE rule is enacted. This can be terrible for programs when n is high!

The approach is then to make this rule more specific, restricting it to a small set of cases that we know won't have the problem. We can begin by more specifically enumerating the problems with the naive rule, which will form a set of priorities for us to develop a better CSE:

  • The two occurrences of e might be far apart in e', leading to a long lifetime for the let x = e binding.

  • The let-binding must always allocate a closure where previously there might not have been one.

  • This can create an unbound number of closures.

  • There are cases where the closure might never deallocate.

Something better

let x = e in e'[e]  ~>  let x = e in e'[x]

This is a more conservative rule but is much safer. Here we recognize that e appears twice but the first occurrence syntactically dominates the second expression, meaning here that the programmer has already introduced a let-binding. We can safely just reuse that let-binding and replace the second occurrence of e with x. No new closures are allocated.

Another example of syntactic domination:

case e of { x -> e'[e] }  ~> case e of { x -> e'[x] }

And yet another:

case e of {
   Constructor x0 x1 ... xn ->
     e'[e]
}

~>

case e of {
   Constructor x0 x1 ... xn ->
     e'[Constructor x0 x1 ... xn]
}

These rules all take advantage of existing structure in the program to ensure that the kinetics of space usage remain the same before and after the transformation. They are much more conservative than the original CSE but they are also much safer.

See also

For a full discussion of CSE in a lazy FPL, read Chitil's (very accessible) 1997 paper. For a full treatment of how CSE works in a production compiler, see GHC's CSE.hs module, which is documented very thoroughly thanks to GHC's tradition of writing long footnotes. The comment-to-code ratio in that module is off the charts. Also note how old that file is (1993)!

Rizzio answered 19/11, 2016 at 16:7 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.