Explanation of "Lose your head" in lazy sequences
Asked Answered
C

4

20

In Clojure programming language, why this code passes with flying colors?

(let [r (range 1e9)] [(first r) (last r)])

While this one fails:

(let [r (range 1e9)] [(last r) (first r)])

I know it is about "Losing your head" advice but would you please explain it to me? I'm not able to digest it yet.

UPDATE:
It is really hard to pick the correct answer, two answers are amazingly informative.
Note: Code snippets are from "The Joy of Clojure".

Crossexamine answered 18/4, 2011 at 3:8 Comment(0)
I
26

range generates elements as needed.

In the case of (let [r (range 1e9)] [(first r) (last r)]), it grabs the first element (0), then generates a billion - 2 elements, throwing them out as it goes, and then grabs the last element (999,999,999). It never has any need to keep any part of the sequence around.

In the case of (let [r (range 1e9)] [(last r) (first r)]), it generates a billion elements in order to be able to evaluate (last r), but it also has to hold on to the beginning of the list it's generating in order to later evaluate (first r). So it's not able to throw anything away as it goes, and (I presume) runs out of memory.

Ivon answered 18/4, 2011 at 3:15 Comment(3)
In the second code, isn't Clojure supposed to execute (last r) and then trying to execute (first r)? I mean according to your answer, Clojure looks like it is peeking to the next form in order to decide whether to drop the head or not.Crossexamine
See Rafał Dowgird's answer. In both cases, Clojure sees that there are future references to r and so it will have to keep it around.Ivon
Wouldn't it be possible to optimise the way lazy sequences work in Clojure, so that second example wouldn't try to realize the whole sequence, maybe in 2.0 ?Engineer
L
31

To elaborate on dfan and Rafał's answers, I've taken the time to run both expressions with the YourKit profiler.

It's fascinating to see the JVM at work. The first program is so GC-friendly that the JVM really shines at managing its memory.

I drew some charts.

GC friendly: (let [r (range 1e9)] [(first r) (last r)])

enter image description here

This program runs very low on memory; overall, less than 6 megabytes. As stated earlier, it is very GC friendly, it makes a lot of collections, but uses very little CPU for that.

Head holder: (let [r (range 1e9)] [(last r) (first r)])

enter image description here

This one is very memory hungry. It goes up to 300 MB of RAM, but that's not enough and the program does not finish (the JVM dies less than one minute later). The GC takes up to 90% of CPU time, which indicates it desperately tries to free any memory it can, but cannot find any (the objects collected are very little to none).

Edit The second program ran out of memory, which triggered a heap dump. An analysis of this dump shows that 70% of the memory is java.lang.Integer objects, which could not be collected. Here's another screenshot:

enter image description here

Lettyletup answered 18/4, 2011 at 12:32 Comment(2)
My charts are ugly. Please suggest a better charting tool !Lettyletup
What an amazing way to answer, absolutely amazing and setting the bar so high.Crossexamine
I
26

range generates elements as needed.

In the case of (let [r (range 1e9)] [(first r) (last r)]), it grabs the first element (0), then generates a billion - 2 elements, throwing them out as it goes, and then grabs the last element (999,999,999). It never has any need to keep any part of the sequence around.

In the case of (let [r (range 1e9)] [(last r) (first r)]), it generates a billion elements in order to be able to evaluate (last r), but it also has to hold on to the beginning of the list it's generating in order to later evaluate (first r). So it's not able to throw anything away as it goes, and (I presume) runs out of memory.

Ivon answered 18/4, 2011 at 3:15 Comment(3)
In the second code, isn't Clojure supposed to execute (last r) and then trying to execute (first r)? I mean according to your answer, Clojure looks like it is peeking to the next form in order to decide whether to drop the head or not.Crossexamine
See Rafał Dowgird's answer. In both cases, Clojure sees that there are future references to r and so it will have to keep it around.Ivon
Wouldn't it be possible to optimise the way lazy sequences work in Clojure, so that second example wouldn't try to realize the whole sequence, maybe in 2.0 ?Engineer
A
12

What really holds the head here is the binding of the sequence to r (not the already-evaluated (first r), since you cannot evaluate the whole sequence from its value.)

In the first case the binding no longer exists when (last r) is evaluated, since there are no more expressions with r to evaluate. In the second case, the existence of the not-yet-evaluated (first r) means that the evaluator needs to keep the binding to r.

To show the difference, this evaluates OK:

user> (let [r (range 1e8) a 7] [(last r) ((constantly 5) a)])

[99999999 5]

While this fails:

(let [r (range 1e8) a 7] [(last r) ((constantly 5) r)])

Even though the expression following (last r) ignores r, the evaluator is not that smart and keeps the binding to r, thus keeping the whole sequence.

Edit: I have found a post where Rich Hickey explains the details of the mechanism responsible for clearing the reference to the head in the above cases. Here it is: Rich Hickey on locals clearing

Alie answered 18/4, 2011 at 12:19 Comment(0)
P
2

for a technical description, go to http://clojure.org/lazy. the advice is mentioned in the section Don't hang (onto) your head

Plath answered 19/4, 2011 at 9:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.