What is the difference between a cyclic list and an infinite list in haskell?
Asked Answered
J

3

20

Referencing @dfeuer's answer to this question: Least expensive way to construct cyclic list in Haskell, which says that using cyclic lists 'defeats' the garbage collector as it has to keep everything you've consumed from a cyclic list allocated till you drop the reference to any cons cells in the list.

Apparently in Haskell a cyclic list and an infinite list are two separate things. This blog (https://unspecified.wordpress.com/2010/03/30/a-doubly-linked-list-in-haskell/) says that if you implement cycle as follows:

cycle xs = xs ++ cycle xs

it is an infinite list, not a cyclic list. To make it cyclic you have to implement it like this (as is found in the Prelude source code):

cycle xs = xs' where xs' = xs ++ xs'

What exactly is the difference between these two implementations? And why is it that if you are holding onto one cons cell somewhere in a cyclic list, that the garbage collector has to keep everything before it allocated as well?

Jinajingle answered 26/8, 2014 at 5:7 Comment(3)
Exactly the same as in other languages...Decent
I misread and expected a bad joke about people who ride bicycles...Amusing
The linked question seemed to refer to very long cycles. cycle generally gives very good results for short cycles, but for long ones, the way you use the thing will determine whether an infinite list representation will do the trick or whether you're better off changing the structure of your program.Preferential
F
17

The difference is entirely in the memory representation. From the point of view of the semantics of the language, they're indistinguishable—you can't write a function that can tell them apart, so your two versions of cycle are considered two implementations of the same function (they're the exact same mapping of arguments to results). In fact, I don't know if the language definition guarantees that one of those is cyclical and the other infinite.

But anyway, let's bring out the ASCII art. Cyclical list:

   +----+----+                 +----+----+
   | x0 |   ----->   ...   --->| xn |    |
   +----+----+                 +----+-|--+
     ^                                |
     |                                |
     +--------------------------------+

Infinite list:

   +----+----+
   | x0 |   ----->  thunk that produces infinite list
   +----+----+

The thing with the cyclical list is that from every cons cell in the list there is a path to all of the others and itself. This means that from the point of view of the garbage collector, if one of the cons cells is reachable, then all are. In the plain infinite list, on the other hand, there aren't any cycles, so from a given cons cell only its successors are reachable.

Note that the infinite list representation is more powerful than the cyclical one, because the cyclical representation only works with lists that repeat after some number of elements. For example, the list of all prime numbers can be represented as an infinite list, but not as a cyclical one.

Note also that this distinction can be generalized into two distinct ways of implementing the fix function:

fix, fix' :: (a -> a) -> a
fix  f = let result = f result in result
fix' f = f (fix' f)

-- Circular version of cycle:
cycle  xs = fix (xs++)

-- Infinite list version of cycle:
cycle' xs = fix' (xs++)

The GHC libraries go for my fix definition. The way GHC compiles code means that the thunk created for result is used both as the result and the argument of the application of f. I.e., the thunk, when forced, will call the object code for f with the thunk itself as its argument, and replace the thunk's contents with the result.

Forayer answered 26/8, 2014 at 5:21 Comment(3)
could fix' be called _Y perhaps? because this is what it is, after all -- the Y combinator, emulating sharing with copying of data.Jackfish
@WillNess that depends on how exactly you define the term "Y combinator." As I understand it, in its narrowest sense it refers to the lambda term λf.(λx.f (x x)) (λx.f (x x)), while "fixed point combinator" refers to the function that this term denotes. But of course these words are often used much more loosely than that.Forayer
yes, exactly, like that lambda term you've shown. "fixpoint" is general, "Y combinator" is very specific; my point was that just as in lambda calculus, the self-reference ("sharing") is emulated through copying of terms (data), in Y (because in lambda calculus reduction is defined through textual substitution).Jackfish
E
9

Cyclic lists and infinite lists are different operationally, but not semantically.

A cyclic list is literally a loop in memory - imagine a singly linked list with the pointers following a cycle - so takes up constant space. Because each cell in the list can be reached from any other cell, holding onto any one cell will cause the entire list to be held onto.

An infinite list will take up more and more space as you evaluate more of it. Earlier elements will be garbage collected if no longer needed, so programs that process it may still run in constant space, although the garbage collection overhead will be higher. If earlier elements in the list are needed, for example because you hold onto a reference to the head of the list, then the list will consume linear space as you evaluate it, and will eventually exhaust available memory.

The reason for this difference is that without optimisations, a typical Haskell implementation like GHC will allocate memory once for a value, like xs' in the second definition of cycle, but will repeatedly allocate memory for a function invocation, like cycle xs in the first definition.

In principle optimisations might turn one definition into the other, but because of the quite different performance characteristics, it's unlikely that this will happen in practice as compilers are generally quite conservative about making programs behave worse. In some cases the cyclic variant will be worse because of the garbage collection properties already mentioned.

Emasculate answered 26/8, 2014 at 5:13 Comment(8)
on the other hand, with cyclic list the entire length of the prefix will be held in memory whereas with infinite only the length of the frontier in use will be held, which might be as short as 1 element. I guess that's "some cases" that you mention in the end of your answer. :)Jackfish
that was what I meant by "in some cases the cyclic variant will be worse"Emasculate
yeah, just finished reading. :) you could also mention the cse optimization and -no-cse flag explicitly.Jackfish
There is the semantic difference that a cyclic list repeats a number of elements over and over again, while an infinite list might not do that. Not every infinite list can be turned into a cyclic one.Nottingham
Would GHC's CSE even possibly fire in this situation?Emasculate
@Nottingham IIUYC, you refer to the title only, but in the context of the question's text it is clear I think that the OP meant the repeating list, either achieved through actual pointer manipulation, or through copying on elements, indefinitely. Of course [1..] or cycle [1..] is not actually cyclic, or did you mean something else?Jackfish
@WillNess doesn't laziness still apply to cyclical lists? If I do xs = [1..10000000] ++ xs, it wouldn't take up all the memory immediately.Amused
@PyRulez of course. the question still remains whether a list structure is retained in memory created by [1..1000000], or not at all, since the definition itself is dormant until coming to life by demand from some print upstream. Haskell list==Python generator + storage of results IF needed. [1..10000000] is a generator definition which can just be fired up the second time, instead of storing all the previously generated elements. doing the extra effort of repeated generation is much better than taking up huge areas of memory (but not small ones, there it's the other way).Jackfish
E
1
cycle xs = xs ++ cycle xs            -- 1
cycle xs = xs' where xs' = xs ++ xs' -- 2

What exactly is the difference between these two implementations?

Using GHC, the difference is that implementation #2 creates a self-referential value (xs'), while #1 merely creates a thunk which happens to be the same.

And why is it that if you are holding onto one cons cell somewhere in a cyclic list, that the garbage collector has to keep everything before it allocated as well?

This is again GHC specific. As Luis said, if you have a reference to one cons cell in a cyclic list, then you can reach the whole list just by going around the cycle. The garbage collector is conservative and won't collect anything that you can still reach.


Haskell is pure and where-refactoring is sound... only when you disregard memory usage (and a few other things like CPU usage and computation time). Haskell the language does not specify what the compiler should do to differentiate #1 and #2. GHC the implementation follows certain memory management patterns which are sensible, but not immediately obvious.

Euphonious answered 4/9, 2014 at 19:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.