Sharing vs. non-sharing fixed-point combinator
Asked Answered
P

3

8

This is the usual definition of the fixed-point combinator in Haskell:

fix :: (a -> a) -> a
fix f = let x = f x in x

On https://wiki.haskell.org/Prime_numbers, they define a different fixed-point combinator:

_Y   :: (t -> t) -> t
_Y g = g (_Y g)                -- multistage, non-sharing,  g (g (g (g ...)))
    -- g (let x = g x in x)    -- two g stages, sharing

_Y is a non-sharing fixpoint combinator, here arranging for a recursive "telescoping" multistage primes production (a tower of producers).

What exactly does this mean? What is "sharing" vs. "non-sharing" in that context? How does _Y differ from fix?

Plage answered 11/12, 2018 at 0:45 Comment(1)
the difference between the two definitions in general is also discussed here: Why does GHC make fix so confounding? .Sailplane
S
5

"Sharing" means f x re-uses the x that it creates; but with _Y g = g . g . g . g . ..., each g calculates its output anew (cf. this and this).

In that context, the sharing version has much worse memory usage, leads to a space leak.1

The definition of _Y mirrors the usual lambda calculus definition's effect for the Y combinator, which emulates recursion by duplication, while true recursion refers to the same (hence, shared) entity.

In

    x      = f x
    (_Y g) = g (_Y g)

both xs refer to the same entity, but each of (_Y g)s refer to equivalent, but separate, entity. That's the intention of it, anyway.

Of course thanks to referential transparency there's no guarantee in Haskell the language for any of this. But GHC the compiler does behave this way.

_Y g is a common sub-expression and it could be "eliminated" by a compiler by giving it a name and reusing that named entity, subverting the whole purpose of it. That's why the GHC has the "no common sub-expressions elimination" -fno-cse flag which prevents this explicitly. It used to be that you had to use this flag to achieve the desired behaviour here, but not anymore. GHC won't be as aggressive at common sub-expressions elimination anymore, with the more recent (read: several years now) versions.

disclaimer: I'm the author of that part of the page you're referring to. Was hoping for the back-and-forth that's usual on wiki pages, but it never came, so my work didn't get reviewed like that. Either no-one bothered, or it is passable (lacking major errors). The wiki seems to be largely abandoned for many years now.


1 The g function involved,

(3:) . minus [5,7..] . foldr (\ (x:xs) ⟶ (x:) . union xs) [] 
                      . map (\ p ⟶ [p², p² + 2p..])

produces an increasing stream of all odd primes given an increasing stream of all odd primes. To produce a prime N in value, it consumes its input stream up to the first prime above sqrt(N) in value, at least. Thus the production points are given roughly by repeated squaring, and there are ~ log (log N) of such g functions in total in the chain (or "tower") of these primes producers, each immediately garbage collectible, the lowest one producing its primes given just the first odd prime, 3, known a priori.

And with the two-staged _Y2 g = g x where { x = g x } there would be only two of them in the chain, but only the top one would be immediately garbage collectible, as discussed at the referenced link above.

Sailplane answered 11/12, 2018 at 8:0 Comment(0)
C
4

_Y is translated to the following STG:

_Y f = let x = _Y f in f x

fix is translated identically to the Haskell source:

fix f = let x = f x in x

So fix f sets up a recursive thunk x and returns it, while _Y is a recursive function, and importantly it’s not tail-recursive. Forcing _Y f enters f, passing a new call to _Y f as an argument, so each recursive call sets up a new thunk; forcing the x returned by fix f enters f, passing x itself as an argument, so each recursive call is into the same thunk—this is what’s meant by “sharing”.

The sharing version usually has better memory usage, and also lets the GHC RTS detect some kinds of infinite loop. When a thunk is forced, before evaluation starts, it’s replaced with a “black hole”; if at any point during evaluation of a thunk a black hole is reached from the same thread, then we know we have an infinite loop and can throw an exception (which you may have seen displayed as Exception: <<loop>>).

Confirmand answered 11/12, 2018 at 7:47 Comment(4)
in that context, the sharing version is worse, leads to space leak. :) (more discussion and links are in my answer). another case where we don't want sharing is e.g. powerset computation.Sailplane
@WillNess: Right you are; I added a “usually” qualification to the “better memory usage” claim—it does depend on what you’re doing with it. There’s often a tradeoff between improving sharing and avoiding needless retention.Confirmand
yeah. I guess it depends on how large is the reused portion of the output. with Hamming numbers for example, for O(n) sequence the retained portion is O(n^(2/3)), so it's OK. But with primes the retained portion goes from ~sqrt(n) to ~(n), so it's too big.Sailplane
for the opposite example, calculating fibonaccis with the sharing fix is immensely more efficient.Sailplane
U
2

I think you already received excellent answers, from a GHC/Haskell perspective. I just wanted to chime in and add a few historical/theoretical notes.

The correspondence between unfolding and cyclic views of recursion is rigorously studied in Hasegawa's PhD thesis: https://www.springer.com/us/book/9781447112211

(Here's a shorter paper that you can read without paying Springer: https://link.springer.com/content/pdf/10.1007%2F3-540-62688-3_37.pdf)

Hasegawa assumes a traced monoidal category, a requirement that is much less stringent than the usual PCPO assumption of domain theory, which forms the basis of how we think about Haskell in general. What Hasegawa showed was that one can define these "sharing" fixed point operators in such a setting, and established that they correspond to the usual unfolding view of fixed points from Church's lambda-calculus. That is, there is no way to tell them apart by making them produce different answers.

Hasegawa's correspondence holds for what's known as central arrows; i.e., when there are no "effects" involved. Later on, Benton and Hyland extended this work and showed that the correspondence holds for certain cases when the underlying arrow can perform "mild" monadic effects as well: https://pdfs.semanticscholar.org/7b5c/8ed42a65dbd37355088df9dde122efc9653d.pdf

Unfortunately, Benton and Hyland only allow effects that are quite "mild": Effects like the state and environment monads fit the bill, but not general effects like exceptions, lists, or IO. (The fixed point operators for these effectful computations are known as mfix in Haskell, with the type signature (a -> m a) -> m a, and they form the basis of the recursive-do notation.)

It's still an open question how to extend this work to cover arbitrary monadic effects. Though it doesn't seem to be receiving much attention these days. (Would make a great PhD topic for those interested in the correspondence between lambda-calculus, monadic effects, and graph-based computations.)

Uboat answered 11/12, 2018 at 18:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.