Here's some related answers (by me):
Basically, with Y defined as λr.(λh.h h) (λg.r (λx.(g g) x))
, an application Y r
reduces as
Y r
(λw.(λh.h h) (λg.w (λx.(g g) x))) r
(λh.h h) (λg.r (λx.(g g) x))
h h
;where
h = (λg.r (λx.(g g) x)) <----\
|
(λg.r (λx.(g g) x)) h |
r (λx.(g g) x) <-------------- | ----------\
;where | |
g = h -----/ |
;so that |
(g g) = (h h) = r (λx.(g g) x) ------/
So r
must expect two arguments - first representing the recursive function to be called, and second - an actual argument:
r = λf (λx. ....x.....(f y)...... )
so that (Y r) x
reduces as
(r (λx.(g g) x)) x
(r f) x
;where
f = (λx.(g g) x)
f y = (λx.(g g) x) y = (g g) y = (r f) y ; f is "fixed point" of r
The definiton f = (λx.(g g) x)
means, when f y
is called, (g g) y
will be called, at which point g
will be self-applied, r
"pulled" from inside g
and the result of (r f)
called with y
argument. I.e. any call (f y)
in the body of lambda expression resulting from (r f)
application, is translated back to (r f) y
i.e. invocation of same body with a new argument y
.
The important implementational detail is whether it is the same function body, or its copy, but the semantics are the same - we are able to enter the same function body with a new argument value.
The essence of Y combinator is replication through reference and self-application: we refer to the same thing through same name, twice; and thus we arrange for it to receive itself as an argument.
When there's no referencing, as in pure lambda calculus, and parameters receive textual copies of arguments - i.e. reduction is done by textual rewriting - this still works, because same copies get replicated and passed around, being fed as argument to self so it is available on the next iteration, if need be.
But it is much more efficient when shared referencing is available (all uses of same name refer to same thing). Under environment model of evaluation creation of self-referential function is simple as
(let ((fact #f))
(set! fact
(lambda (n) (if (< 2 n) 1
(* n (fact (- n 1))))))
fact)
In fact the definition in your answer is that of applicative-order Y combinator. With normal-order, eta-reduction can be applied without causing infinite looping, to get Ynorm = (λw.(λh.h h) (λg.w (g g)))
which is canonically written as
Ynorm = (λf.(λx.f (x x)) (λx.f (x x)))
indeed
Ynorm g
= (λx.g (x x)) (λx.g (x x))
= g ((λx.g (x x)) (λx.g (x x)))