Is everything in Haskell stored in thunks, even simple values?
Asked Answered
B

5

46

What do the thunks for the following value/expression/function look like in the Haskell heap?

val = 5                -- is `val` a pointer to a box containing 5?
add x y = x + y        
result = add 2 val     
main = print $ result

Would be nice to have a picture of how these are represented in Haskell, given its lazy evaluation mode.

Bohr answered 12/12, 2011 at 17:38 Comment(6)
Thunks are a feature of the running program, not the static source code. They may exist for some time, then be evaluated and disappear to be replaced by their result.Vocabulary
Well, your third example would be represented as a type error. :]Sanderling
Expanding on what @C.A.McCann said; result = add 2 + 4 should be result = add 2 4.Excise
JB: thanks for the edits (+1). would it make sense to show how this differs compared to a strict language like SML?Bohr
C. A. McCann & dflemstr: I saw my mistake, thanks. now corrected :)Bohr
Short answer: no. The Haskell 2010 specs do not say anything about thunks. However, for GHC (popular and actively maintained compiler) short answer: yes.Cartelize
D
72

Official answer

It's none of your business. Strictly implementation detail of your compiler.

Short answer

Yes.

Longer answer

To the Haskell program itself, the answer is always yes, but the compiler can and will do things differently if it finds out that it can get away with it, for performance reasons.

For example, for '''add x y = x + y''', a compiler might generate code that works with thunks for x and y and constructs a thunk as a result. But consider the following:

foo :: Int -> Int -> Int
foo x y = x * x + y * y

Here, an optimizing compiler will generate code that first takes x and y out of their boxes, then does all the arithmetic, and then stores the result in a box.

Advanced answer

This paper describes how GHC switched from one way of implementing thunks to another that was actually both simpler and faster: http://research.microsoft.com/en-us/um/people/simonpj/papers/eval-apply/

Debor answered 12/12, 2011 at 18:0 Comment(7)
Haha. I think the "official answer" alone is worth a +1.Sanderling
It becomes your business when your (or your user's) computer runs out of memory. That said, I still agree with the sentiment.Greenhead
@MattFenwick Sure, that's one of the reasons why the official answer is never the most interesting answer :-)Debor
@MattFenwick the presence of a box adds a constant overhead and constant overheads are very rarely the reason to run out of memory. Complex optimizations such as list fusion can remove a non-constant overhead, but that's a different story.Glycolysis
@Glycolysis The presence of an unevaluated thunk, however, might keep data structures alive which are needed to evaluate the thunk, but which could be discarded once the thunk is evaluated. This is the main reason why it pays to know about strictness analysis, which I didn't really address in my answer.Debor
WRT the official answer, what does that say about when to use the strict syntax (e.g !foo)?Limn
@JohnF.Miller You use strict syntax (data Widget = Widget !foo) when you want Widget undefined to evaluate to undefined.Fromma
A
13

In general, even primitive values in Haskell (e.g. of type Int and Float) are represented by thunks. This is indeed required by the non-strict semantics; consider the following fragment:

bottom :: Int
bottom = div 1 0

This definition will generate a div-by-zero exception only if the value of bottom is inspected, but not if the value is never used.

Consider now the add function:

add :: Int -> Int -> Int
add x y = x+y

A naive implementation of add must force the thunk for x, force the thunk for y, add the values and create an (evaluated) thunk for the result. This is a huge overhead for arithmetic compared to strict functional languages (not to mention imperative ones).

However, an optimizing compiler such as GHC can mostly avoid this overhead; this is a simplified view of how GHC translates the add function:

add :: Int -> Int -> Int
add (I# x) (I# y) = case# (x +# y) of z -> I# z 

Internally, basic types like Int is seen as datatype with a single constructor. The type Int# is the "raw" machine type for integers and +# is the primitive addition on raw types. Operations on raw types are implemented directly on bit-patterns (e.g. registers) --- not thunks. Boxing and unboxing are then translated as constructor application and pattern matching.

The advantage of this approach (not visible in this simple example) is that the compiler is often capable of inlining such definitions and removing intermediate boxing/unboxing operations, leaving only the outermost ones.

Angie answered 12/12, 2011 at 18:26 Comment(0)
S
7

It would be absolutely correct to wrap every value in a thunk. But since Haskell is non-strict, compilers can choose when to evaluate thunks/expressions. In particular, compilers can choose to evaluate an expression earlier than strictly necessary, if it results in better code.

Optimizing Haskell compilers (GHC) perform Strictness analysis to figure out, which values will always be computed.

In the beginning, the compiler has to assume, that none of a function's arguments are ever used. Then it goes over the body of the function and tries to find functions applications that 1) are known to be strict in (at least some of) their arguments and 2) always have to be evaluated to compute the function's result.

In your example, we have the function (+) that is strict in both it's arguments. Thus the compiler knows that both x and y are always required to be evaluated at this point. Now it just so happens, that the expression x+y is always necessary to compute the function's result, therefore the compiler can store the information that the function add is strict in both x and y.

The generated code for add* will thus expect integer values as parameters and not thunks. The algorithm becomes much more complicated when recursion is involved (a fixed point problem), but the basic idea remains the same.

Another example:

mkList x y = 
    if x then y : []
         else []

This function will take x in evaluated form (as a boolean) and y as a thunk. The expression x needs to be evaluated in every possible execution path through mkList, thus we can have the caller evaluate it. The expression y, on the other hand, is never used in any function application that is strict in it's arguments. The cons-function : never looks at y it just stores it in a list. Thus y needs to be passed as a thunk in order to satisfy the lazy Haskell semantics.

mkList False undefined -- absolutely legal

*: add is of course polymorphic and the exact type of x and y depends on the instantiation.

Sissified answered 12/12, 2011 at 18:2 Comment(0)
E
6

Short answer: Yes.

Long answer:

val = 5

This has to be stored in a thunk, because imagine if we wrote this anywhere in our code (like, in a library we imported or something):

val = undefined

If this has to be evaluated when our program starts, it would crash, right? If we actually use that value for something, that would be what we want, but if we don't use it, it shouldn't be able to influence our program so catastrophically.

For your second example, let me change it a little:

div x y = x / y

This value has to be stored in a thunk as well, because imagine some code like this:

average list =
  if null list
     then 0
     else div (sum list) (length list)

If div was strict here, it would be evaluated even when the list is null (aka. empty), meaning that writing the function like this wouldn't work, because it wouldn't have a chance to return 0 when given the empty list, even though this is what we would want in this case.

Your final example is just a variation of example 1, and it has to be lazy for the same reasons.

All this being said, it is possible to force the compiler to make some values strict, but that goes beyond the scope of this question.

Excise answered 12/12, 2011 at 17:49 Comment(3)
@ChristianKlauser the point was that the result of div had to be stored in a thunk for the function to work. I never said anything about the strictness of average. Also, my tongue-in-cheek short answer was "Yes" to this question, meaning that while it might be valid to sometimes make a function strict because that wouldn't change its behavior, it is always safe to assume that every expression in Haskell is lazy (unless annotated with strictness annotations etc.).Excise
Even if div (or the language itself) was strict it will never be evaluated if the list is null, that's the point of the if-statement anyway!Tupi
@is7s, you can make every argument to if..then..else produce a thunk and let the 'arguments' themselves be strict, or you can make if..then..else strict and let all 'arguments' be lazy - it's just two ways of saying the same thing, is it not?Excise
C
4

I think the others answered your question nicely, but just for completeness's sake let me add that GHC offers you the possibility of using unboxed values directly as well. This is what Haskell Wiki says about it:

When you are really desperate for speed, and you want to get right down to the “raw bits.” Please see GHC Primitives for some information about using unboxed types.

This should be a last resort, however, since unboxed types and primitives are non-portable. Fortunately, it is usually not necessary to resort to using explicit unboxed types and primitives, because GHC's optimiser can do the work for you by inlining operations it knows about, and unboxing strict function arguments. Strict and unpacked constructor fields can also help a lot. Sometimes GHC needs a little help to generate the right code, so you might have to look at the Core output to see whether your tweaks are actually having the desired effect.

One thing that can be said for using unboxed types and primitives is that you know you're writing efficient code, rather than relying on GHC's optimiser to do the right thing, and being at the mercy of changes in GHC's optimiser down the line. This may well be important to you, in which case go for it.

As mentioned, it's non-portable, so you need a GHC language extension. See here for their docs.

Coleman answered 13/12, 2011 at 15:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.