How much memory does a thunk use?

Asked 21/12, 2012 at 0:57 Answered 21/12, 2012 at 9:58

Let's say I have a very large number (millions/billions+) of these simple Foo data structures:

data Foo = Foo
    { a :: {-# UNPACK #-}!Int
    , b :: Int
    }

With so many of these floating around, it becomes necessary to think about how much memory they consume.

On a 64bit machine, each Int is 8 bytes, so a only takes 8 bytes (because it is strict and unpacked). But how much memory will b take up? I imagine this would change depending on whether the thunk is evaluated or not, right?

I imagine in the general case this is impossible to tell, because b could depend on any number of memory positions that are only staying in memory in case b needs to be evaluated. But what if b only depended on (some very expensive operation on) a? Then, is there a deterministic way to tell how much memory will be used?

Siclari answered 21/12, 2012 at 0:57 Comment(4)

So in short, you don't want anyone to once again point to vacuum-cairo, but you would like an explanation for how thunks are represented in memory? – Nine 21/12, 2012 at 1:20

@ThomasM.DuBuisson Correct. As low-level as needed. – Siclari 21/12, 2012 at 1:29

There is a deterministic way to determine how much memory some code will use. But you won't like it. – Python 21/12, 2012 at 3:26

@DanielWagner Empirically measuring it? You're right. The main problem with that is that all the b's could potentially be of different size, and I'd like to give clients some guidelines as to what to expect. It would be nice to be able to place some (theoretic) bounds on these sizes. – Siclari 21/12, 2012 at 3:32

In addition to user239558’s answer, and in response to your comment there, I’d like to point out some tools that allow you to inspect the heap representation of your value, find answers to questions like this yourself and to see the effect of optimizations and different ways of compiling.

ghc-datasize

tells you the size of a closure. Here you can see that (on a 64 bit machine) in evaluated form and after garbage collection, Foo 1 2 requires 24 bytes on its own and, including dependencies, 40 bytes in total:

Prelude GHC.DataSize Test> let x = Foo 1 2
Prelude GHC.DataSize Test> x
Foo {a = 1, b = 2}
Prelude GHC.DataSize Test> System.Mem.performGC
Prelude GHC.DataSize Test> closureSize x
24
Prelude GHC.DataSize Test> recursiveSize x
40

To reproduce this you need to load the data definition in compiled form with -O, otherwise, the {-# UNPACK #-} pragma has no effect.

Now let us create a thunk and see that the size goes up considerably:

Prelude GHC.DataSize Test> let thunk = 2 + 3::Int
Prelude GHC.DataSize Test> let x = Foo 1 thunk
Prelude GHC.DataSize Test> x `seq` return ()
Prelude GHC.DataSize Test> System.Mem.performGC
Prelude GHC.DataSize Test> closureSize x
24
Prelude GHC.DataSize Test> recursiveSize x
400

Now this is quite excessive. The reason is that this calculation includes references to static closures, Num typeclass dictionaries and the like, and generally the GHCi bytecode is very unoptimized. So let’s put that in a proper Haskell program. Running

main = do
    l <- getArgs
    let n = length l
    n `seq` return ()
    let thunk = trace "I am evaluated" $ n + n
    let x = Foo 1 thunk
    a x `seq` return ()
    performGC
    s1 <- closureSize x
    s2 <- closureSize thunk
    r <- recursiveSize x
    print (s1, s2, r)

gives (24, 24, 48), so now the Foo value is made up of Foo itself and a thunk. Why only the thunk, shouldn’t there also be a reference to n adding another 16 bytes? To answer this, we need a better tool:

ghc-heap-view

This library (by me) can investigate the heap and tell you precisely how your data is represented there. So adding this line to the file above:

buildHeapTree 1000 (asBox x) >>= putStrLn . ppHeapTree

we get (when we pass five parameters to the program) the result Foo (_thunk 5) 1. Note that the order of arguments is swapped on the heap, because pointers always come before data. The plain 5 indicates that the closure of the thunk stores its argument unboxed.

As a last exercise we verify this by making the thunk lazy in n: Now

main = do
    l <- getArgs
    let n = length l
    n `seq` return ()
    let thunk = trace "I am evaluated" $ n
    let x = Foo 1 thunk
    a x `seq` return ()
    performGC
    s1 <- closureSize x
    s2 <- closureSize thunk
    s3 <- closureSize n
    r <- recursiveSize x
    buildHeapTree 1000 (asBox x) >>= putStrLn . ppHeapTree
    print (s1, s2, s3, r)

gives a heap representation of Foo (_thunk (I# 4)) 1 with a separate closure for n (as indicated by the presence of the I# constructor) and shows the expected sizes for the values and their total, (24,24,16,64).

Oh, and if this is still too high level, getClosureRaw gives you the raw bytes.

Contrabandist answered 21/12, 2012 at 9:58 Comment(2)

In your last snippet, why are you doing a x `seq` return ()? Isn't the a field already unpacked? – Quoits 28/7, 2022 at 15:43

Indeed. I guess ``x seq return ()` would have done the job as well. – Contrabandist 29/7, 2022 at 17:9

If b is evaluated, it will be a pointer to an Int object. The pointer is 8 bytes, and the Int object consists of a header which is 8 bytes, and the Int# which is 8 bytes.

So in that case the memory usage is the Foo object (8 header, 8 Int, 8 pointer) + boxed Int (8 header, 8 Int#).

When b is unevaluated, the 8-byte pointer in Foo will point to a Thunk object. The Thunk object represents an unevaluated expression. Like the Int object, this object has a 8-byte header, but the rest of the object consists of the free variables in the unevaluated expression.

So first of all, the number of free variables held in this thunk object depends on the expression that creates the Foo object. Different ways of creating a Foo will have thunk objects of potentially different sizes created.

Then secondly, the free variables are all the variables that are mentioned in the unevaluated expression that are taken from outside of the expression, called the environment of a closure. They are sort of the parameters to the expression and they need to be stored somewhere, and thus they are stored in the thunk object.

So you can look at the actual places where the Foo constructor is invoked and look at the number of free variables in the second parameter to estimate the size of the thunk.

A Thunk object is really the same as a closure in most other programming languages, with one important difference. When it is evaluated, it can be overwritten by a redirect pointer to the evaluated object. Thus it is a closure that automatically memoizes its result.

This redirect pointer will point to the Int object (16 bytes). However the now "dead" thunk will be eliminated on the next garbage collection. When the GC copies Foo, it will make Foo's b point directly to the Int object, making the thunk unreferenced and thus garbage.

Bismarck answered 21/12, 2012 at 4:38 Comment(2)

Where can I find more information about these internal data structures? – Siclari 21/12, 2012 at 5:56

Thanks, very interesting information about what goes on under the hood. I second the the question of Mike Izbicki: do you have links that provide more of these implementation details? – Epochal 21/12, 2012 at 22:21

ghc-datasize

ghc-heap-view

Recommended topics

Hot tags