Is "hello" a thunk or in normal form?
Asked Answered
B

2

6

Both 1 and "hello" are in normal form, thus are not thunks. So why does GHCi show that "hello" is a thunk?

ghci> x = 1::Int
ghci> :sprint x
x = 1

ghci> x = "hello"::String
ghci> :sprint x
x = _

Edit: I copy a comment from my previous post that tries to explain this question

Haskell-the-language (or -standard) doesn't really define what value is a thunk, what's in normal form, etc.. All it defines are strictness semantics of functions you might call on such values. In your examples, that doesn't happen so basically the compiler is free to do anything, possibly involving heuristics that for such and such type a thunk is likely to achieve nothing but higher memory overhead.

Can anyone give me some articles or documents about this?

Boaster answered 16/5, 2023 at 13:9 Comment(3)
why do you use :: String, do you work with the OverloadedStrings extension?Meatball
as for the "hello", this is something thatwill evaluate to a list, since "hello" evaluates to ('h':('e':('l':('l':('o':[]))))) eventually.Meatball
@WillemVanOnsem The result is the same whether you enable OverloadedStrings or not. I write :: String to make it clear that it is not related to Monomorphism restriction.Boaster
H
5

As I commented earlier, this is mostly about how GHCi decides to store values, rather than anything the Haskell standard specifies. Thunks/laziness are an implementation detail for how non-strictness is achieved, and sometimes they're also a good mental model, but not always. Strictness is all about functions and how ⊥-values are allowed or not allowed to propagate. Any wisdom about where thunks may or may not appear is mostly a by-product of that.

But there are still a couple of things that should be kept in mind on the pure value/thunk level.

Polymorphic values are always evaluated on-demand; they're basically functions with invisible arguments. In GHCi (which has the monomorphism restriction turned off by default) it's easy to define polymorphic values without meaning to, in particular number literals. Then :sprint will only ever give _.
This is not an issue in your examples though: explicitly restricting to a concrete type ensures monomorphic bindings.

As long as you're directly using plain constructors, there is a good chance that the compiler/interpreter will directly store them in normal form. It does this, for example, for something like ('a', (Just 'b','c')). But I don't think this is ever guaranteed, and usually it's irrelevant anyways because literal constructors only get you so far. All that is guaranteed is that a value defined as an outer constructor can be evaluated to WHNF without running into ⊥, but here we're again in the realm of strictness properties.

More typically, values you bind comes from some function and thus start out as thunks. This is in particular also the case for many expressions that look like plain constructors, but aren't: most literals in Haskell can be smart literals, either by default or through extensions. Best known is that number literals are implicitly wrapped in fromInteger, so as to be polymorphic over any Num type. As such it is indeed a bit surprising that

x = 1::Int

does not seem to produce any thunk. But again: this is up to the compiler, and it just makes sense: there is no risk of ⊥, the value is right there, ready to be stored, whereas building a thunk around it would entail a completely unnecessary pointer indirection.

It's worth noting that this does not happen for all number literals:

ghci> let y = 1678796789876876987698768798769876986780987609876 :: Integer
ghci> :sprint y
y = _

In this case, the number doesn't fit int 64 bit anymore, so it's less clear-cut that putting it straight into memory is advantagous; it turns out GHC doesn't risk it in this case.

Now, for strings we would have much the same situation if -XOverloadedStrings were active, in which case all string literals would be required to be wrapped in fromString. But there's no guarantee the other way around: just because the extension isn't active doesn't mean GHC can't wrap the literal in some other function. As Li-yao Xia showed, it wraps it in the low-level function unpackCString#. (This certainly makes sense from a memory perspective, because a C string is continuous and efficient in memory, whereas Haskell's linked-list strings are frankly atrocious for storage, taking about 16× as much memory.) It's legal as long as the compiler is sure that you won't run into ⊥ when evaluating the string.

In this case, explicitly defining the string with plain constructors does cause it to be immediately stored in normal form:

ghci> let x = "hello" :: String
ghci> :sprint x
x = _
ghci> let x' = 'h':'e':'l':'l':'o':[]
ghci> :sprint x'
x' = "hello"
Hardee answered 16/5, 2023 at 14:10 Comment(6)
Thank you for your detail answer. So what the wikibook says is just a mental model, but not how GHC actually implements lazy evaluation. But the order or evaluation is still true, right? That values are evaluated by layer: the evaluation proceeds from the outermost data constructors of expressions and works inward.Boaster
Typically yes, but again the compiler is free to reorder this almost arbitrarily if deemed benefitial in some sense. So long as it doesn't change the strictness semantics.Hardee
@TrungDo No. A fact: if you keep making reductions to an expression, you either arrive at a normal form or get stuck in an infinite loop. Further, any sequence of reductions that gives a normal form gives the same normal form. This means GHC can choose whatever evaluation strategy it likes for an expression, as long as it takes care to avoid getting into unnecessary infinite loops. Lazy evaluation is a simple strategy that is guaranteed to find the normal form if there is one. But if GHC knows better (more likely in optimized, compiled code than GHCi) it can perform non-lazy evaluation.Pagepageant
@Pagepageant Thanks. So I should not bother with thunks and WHNF and just need to remember the non-strictness property that f(⊥) ≠ ⊥Boaster
@TrungDo No, WHNF is still important. Everything in my previous comment more-or-less holds for WHNF as well as for normal form, and WHNF is the concept that the idea of strictness and the rest of the Haskell specification are based on. But thinking about thunks is indeed superfluous, until you need to worry about performance (or use unsafePerformIO).Pagepageant
@Pagepageant Thank you. It makes sense to me now. Unfortunately, every text I have read establishes a strong link between thunks and WHNF.Boaster
C
8

If you tell GHC to print the desugaring of "hello" to GHC Core, you can see that a string literal gets desugared to a call to unpackCString#. The actual contents of the string are stored compactly "hello"# and the function unpackCString# constructs the corresponding linked list 'h' : 'e' : 'l' : 'l' : 'o' : [] on demand, so it will take very little memory when consumed in a streaming fashion.

> :set -ddump-ds -dsuppress-all
> x = "hello"

==================== Desugared ====================
letrec {
  x_aCa = letrec { x_aCe = unpackCString# "hello"#; } in x_aCe; } in
returnIO (: (unsafeCoerce# x_aCa) [])
Cianca answered 16/5, 2023 at 13:57 Comment(0)
H
5

As I commented earlier, this is mostly about how GHCi decides to store values, rather than anything the Haskell standard specifies. Thunks/laziness are an implementation detail for how non-strictness is achieved, and sometimes they're also a good mental model, but not always. Strictness is all about functions and how ⊥-values are allowed or not allowed to propagate. Any wisdom about where thunks may or may not appear is mostly a by-product of that.

But there are still a couple of things that should be kept in mind on the pure value/thunk level.

Polymorphic values are always evaluated on-demand; they're basically functions with invisible arguments. In GHCi (which has the monomorphism restriction turned off by default) it's easy to define polymorphic values without meaning to, in particular number literals. Then :sprint will only ever give _.
This is not an issue in your examples though: explicitly restricting to a concrete type ensures monomorphic bindings.

As long as you're directly using plain constructors, there is a good chance that the compiler/interpreter will directly store them in normal form. It does this, for example, for something like ('a', (Just 'b','c')). But I don't think this is ever guaranteed, and usually it's irrelevant anyways because literal constructors only get you so far. All that is guaranteed is that a value defined as an outer constructor can be evaluated to WHNF without running into ⊥, but here we're again in the realm of strictness properties.

More typically, values you bind comes from some function and thus start out as thunks. This is in particular also the case for many expressions that look like plain constructors, but aren't: most literals in Haskell can be smart literals, either by default or through extensions. Best known is that number literals are implicitly wrapped in fromInteger, so as to be polymorphic over any Num type. As such it is indeed a bit surprising that

x = 1::Int

does not seem to produce any thunk. But again: this is up to the compiler, and it just makes sense: there is no risk of ⊥, the value is right there, ready to be stored, whereas building a thunk around it would entail a completely unnecessary pointer indirection.

It's worth noting that this does not happen for all number literals:

ghci> let y = 1678796789876876987698768798769876986780987609876 :: Integer
ghci> :sprint y
y = _

In this case, the number doesn't fit int 64 bit anymore, so it's less clear-cut that putting it straight into memory is advantagous; it turns out GHC doesn't risk it in this case.

Now, for strings we would have much the same situation if -XOverloadedStrings were active, in which case all string literals would be required to be wrapped in fromString. But there's no guarantee the other way around: just because the extension isn't active doesn't mean GHC can't wrap the literal in some other function. As Li-yao Xia showed, it wraps it in the low-level function unpackCString#. (This certainly makes sense from a memory perspective, because a C string is continuous and efficient in memory, whereas Haskell's linked-list strings are frankly atrocious for storage, taking about 16× as much memory.) It's legal as long as the compiler is sure that you won't run into ⊥ when evaluating the string.

In this case, explicitly defining the string with plain constructors does cause it to be immediately stored in normal form:

ghci> let x = "hello" :: String
ghci> :sprint x
x = _
ghci> let x' = 'h':'e':'l':'l':'o':[]
ghci> :sprint x'
x' = "hello"
Hardee answered 16/5, 2023 at 14:10 Comment(6)
Thank you for your detail answer. So what the wikibook says is just a mental model, but not how GHC actually implements lazy evaluation. But the order or evaluation is still true, right? That values are evaluated by layer: the evaluation proceeds from the outermost data constructors of expressions and works inward.Boaster
Typically yes, but again the compiler is free to reorder this almost arbitrarily if deemed benefitial in some sense. So long as it doesn't change the strictness semantics.Hardee
@TrungDo No. A fact: if you keep making reductions to an expression, you either arrive at a normal form or get stuck in an infinite loop. Further, any sequence of reductions that gives a normal form gives the same normal form. This means GHC can choose whatever evaluation strategy it likes for an expression, as long as it takes care to avoid getting into unnecessary infinite loops. Lazy evaluation is a simple strategy that is guaranteed to find the normal form if there is one. But if GHC knows better (more likely in optimized, compiled code than GHCi) it can perform non-lazy evaluation.Pagepageant
@Pagepageant Thanks. So I should not bother with thunks and WHNF and just need to remember the non-strictness property that f(⊥) ≠ ⊥Boaster
@TrungDo No, WHNF is still important. Everything in my previous comment more-or-less holds for WHNF as well as for normal form, and WHNF is the concept that the idea of strictness and the rest of the Haskell specification are based on. But thinking about thunks is indeed superfluous, until you need to worry about performance (or use unsafePerformIO).Pagepageant
@Pagepageant Thank you. It makes sense to me now. Unfortunately, every text I have read establishes a strong link between thunks and WHNF.Boaster

© 2022 - 2024 — McMap. All rights reserved.