How to reason about space complexity in Haskell
Asked Answered
T

1

13

I'm trying to find a formal way to think about the space complexity in haskell. I have found this article about the Graph Reduction (GR) technique which seems to me as a way to go. But I'm having problems applying it in some cases. Consider the following example:

Say we have a binary tree:

data Tree = Node [Tree] | Leaf [Int]

makeTree :: Int -> Tree
makeTree 0 = Leaf [0..99]
makeTree n = Node [ makeTree (n - 1)
                  , makeTree (n - 1) ]

and two functions that traverse the tree, one (count1) which streams nicely and the the other one (count2) that creates the whole tree in memory at once; according to the profiler.

count1 :: Tree -> Int
count1 (Node xs) = 1 + sum (map count1 xs)
count1 (Leaf xs) = length xs

-- The r parameter should point to the root node to act as a retainer.
count2 :: Tree -> Tree -> Int
count2 r (Node xs) = 1 + sum (map (count2 r) xs)
count2 r (Leaf xs) = length xs

I think I understand how it works in the case of count1, here is what I think happens in terms of graph reduction:

count1 $ makeTree 2
=> 1 + sum $ map count1 xs
=> 1 + sum $ count1 x1 : map count1 xs
=> 1 + count1 x1                                + (sum $ map count1 xs)
=> 1 + (1 + sum $ map count1 x1)                + (sum $ map count1 xs)
=> 1 + (1 + sum $ (count1 x11) : map count1 x1) + (sum $ map count1 xs)
=> 1 + (1 + count1 x11 + sum $ map count1 x1)   + (sum $ map count1 xs)
=> 1 + (1 + count1 x11 + sum $ map count1 x1)   + (sum $ map count1 xs)
=> 1 + (1 + 100 + sum $ map count1 x1)          + (sum $ map count1 xs)
=> 1 + (1 + 100 + count x12)                    + (sum $ map count1 xs)
=> 1 + (1 + 100 + 100)                          + (sum $ map count1 xs)
=> 202                                          + (sum $ map count1 xs)
=> ...

I think it's clear from the sequence that it runs in constant space, but what changes in case of the count2?

I understand smart pointers in other languages so I vaguely understand that the extra r parameter in count2 function somehow keeps nodes of the tree from beeing destroyed, but I would like to know the exact mechanism, or at least a formal one which I could use in other cases as well.

Thanks for looking.

Terrain answered 5/4, 2011 at 13:15 Comment(13)
Can you show how you call count2? Your comment indicates that you do something like: let t = makeTree 2 in count2 t tAudette
@lngo, yes, here is the code I use for testing.Terrain
You should read a bit about garbage collection.Akers
The example evaluation you show seems to be not quite O(1).Raseta
Reasoning about space in Haskell is horribly.Raseta
@sclv, I like to think that I understand the relevant parts about GCs, but maybe not, could you please be more specific? E.g. if I was working in Java I understand that the tree would not be destroyed as long as I keep the root node stored in some var. What I don't get is the Haskell specific part about it. E.g. in case of count1, the caller also retains the pointer to the root node, but that is not stopping it from destroying the intermediate nodes...Terrain
@Peter, after having seen your code I have a possible explanation for your observation: In the count1 example, even a dumb compiler could figure that the let bounfd variable that points to the root node is not really needed and could inline it at the single point of usage. However, to determine if count2 actually will use its 1st argument is much harder - of course the compiler could rewrite count2 r r with count2 () (makeTree 2). Don't know if this helps but in any case you should not reason about space behaviour of the original code but of post-optimizer code.Audette
@FUZxxl, why do you think it is not constant? (here are the images from profiling: count1 and count2). Maybe you mean that the total number of allocations isn't O(1)? If so, that is all right because all those allocations happend only in small space.Terrain
@Peter -- where in count1 does the caller retain the pointer to the root node? As soon as the pattern match is forced, there's no reference left anywhere to the Node constructor, just to its contents.Akers
The fact that count2 doesn't run in (almost) constant space is not a property of Haskell, but of a particular Haskell implementation. It would be perfectly reasonable to garbage collect the tree even for count2, but it requires the compiler to prove that the first argument to count2 is never reachable. This is a bit tricky.Dortch
@sclv, I meant the caller to be the main function. Which does retains (indeed as you say) pointer to the root if you run count1 twice in a row with the same argument. I mean, I know you are right, just that I believe there is more to the story then just GCs. An exaggerated analogy would be: even if you know how GCs work in Java, it wont tell you the full story about how Eclipse manages it's memory.Terrain
@Peter -- the let binding in the harness code shouldn't make a difference to GC. The issue is not if the name is in scope, it is if the object is reachable. Many more details in SPJ's two books: research.microsoft.com/en-us/um/people/simonpj/papers/…Akers
@sclv, thanks for the link, will read.Terrain
S
7

You could use Adam Bakewell's space semantics,

Haskell currently lacks a standard operational semantics. We argue that such a semantics should be provided to enable reasoning about operational properties of programs, to ensure that implementations guarantee certain space and time behaviour and to help determine the source of space faults. We present a small-step deterministic semantics for the sequential evaluation of Core Haskell programs and show that it is an accurate model of asymptotic space and time usage. The semantics is a formalisation of a graphical notation so it provides a useful mental model as well as a precise mathematical notation. We discuss its implications for education, programming and implementation. The basic semantics is extended with a monadic IO mechanism so that all the space under the control of an implementation is included.

Or work from the Coq specification of the STG machine.

Stated answered 5/4, 2011 at 15:37 Comment(2)
thank you, the first link seems to address directly my question.Terrain
Note to self: Write Haskell as fast as C: exploiting strictness, laziness and recursion, @Don, I believe the credit goes to you.Terrain

© 2022 - 2024 — McMap. All rights reserved.