When you :load
code into GHCi, by default GHCi instruments it with breakpoints. The breakpoints are inactive by default, but they must be present in the code for you to activate them later if you choose to use GHCi's debugger. Code that you type into the GHCi prompt, in contrast, does not get breakpoints sprinkled into it (as a consequence, you cannot set breakpoints inside code you type at the prompt).
Breakpoints show up in Core, GHC's high-level IR. You can see Core by passing -ddump-simpl
to GHC or GHCi. Here is what f
looks like when I :load
it in from a file (you can see I called it Lib.hs
) while running ghci -ddump-simpl
.
Rec {
-- RHS size: {terms: 20, types: 7, coercions: 0, joins: 0/0}
f [Occ=LoopBreaker] :: Int -> [Int] -> [Int]
[GblId]
f = break<Lib,6>()
\ (n_aBW :: Int) ->
break<Lib,5>(n_aBW)
\ (ns_aBX :: [Int]) ->
break<Lib,4>(n_aBW,ns_aBX)
case break<Lib,0>(n_aBW)
<= @Int GHC.Classes.$fOrdInt n_aBW (GHC.Types.I# 0#)
of {
False ->
break<Lib,3>(n_aBW,ns_aBX)
f (break<Lib,1>(n_aBW)
- @Int GHC.Internal.Num.$fNumInt n_aBW (GHC.Types.I# 1#))
(break<Lib,2>(n_aBW,ns_aBX) GHC.Types.: @Int n_aBW ns_aBX);
True -> ns_aBX
}
end Rec }
Compare this to what g
turns into when I type it into the same session. (f
also looks like this if you compile it to object code with ghc -ddump-simpl
):
letrec {
g_aTj [Occ=LoopBreaker]
:: GHC.Types.Int -> [GHC.Types.Int] -> [GHC.Types.Int]
[LclId,
Arity=2,
Unf=Unf{Src=<vanilla>, TopLvl=False,
Value=True, ConLike=True, WorkFree=True, Expandable=True,
Guidance=IF_ARGS [0 0] 160 0}]
g_aTj
= \ (n_aTk :: GHC.Types.Int) (ns_aTl :: [GHC.Types.Int]) ->
case GHC.Classes.<=
@GHC.Types.Int GHC.Classes.$fOrdInt n_aTk (GHC.Types.I# 0#)
of {
GHC.Types.False ->
g_aTj
(GHC.Internal.Num.-
@GHC.Types.Int GHC.Internal.Num.$fNumInt n_aTk (GHC.Types.I# 1#))
(GHC.Types.: @GHC.Types.Int n_aTk ns_aTl);
GHC.Types.True -> ns_aTl
}; } in
-- ... some GHCi prompt stuff under the let
The salient difference is that (:)
when called in f
appears with a break
over it, while in g
there is no break
. When no breakpoints are set, each break
just acts like an identity function: break x = x
(I will ignore the <Module, BreakpointNumber>
part that identifies each break
). Ignoring all the other breakpoints, f
-with-breakpoints acts like it was written
f n ns = if n <= 0 then ns else f (n - 1) (break (n : ns))
In particular
x = f 5 []
= f 4 (break (5 : []))
= f 3 (break (4 : break (5 : [])))
= f 2 (break (3 : break (4 : break (5 : []))))
= f 1 (break (2 : break (3 : break (4 : break (5 : [])))))
= f 0 (break (1 : break (2 : break (3 : break (4 : break (5 : []))))))
= break (1 : break (2 : break (3 : break (4 : break (5 : [])))))
= 1 : break (2 : break (3 : break (4 : break (5 : [])))) -- reached WHNF!
Note that once x
has reached WHNF, the tail is still pointing to an unevaluated break
thunk. In contrast, taking y
to WHNF can give the fully evaluated result 1 : 2 : 3 : 4 : 5 : []
since there are no break
s in g
.
Summarizing: :sprint x
prints 1 : _
because the _
represents a value that might need to trigger a breakpoint but has not gotten the chance to check if it should yet. Breakpoints are not supported for things defined at the prompt like y
, so you can see the whole value immediately. You can also turn breakpoints off for :load
ed code by passing -fno-break-points
, which eliminates the difference between x
and y
.
$ ghci -fno-break-points
GHCi, version 9.10.1: https://www.haskell.org/ghc/ :? for help
ghci> :load Lib.hs
[1 of 1] Compiling Lib ( Lib.hs, interpreted )
Ok, one module loaded.
ghci> seq x ()
()
ghci> :sprint x
x = [1,2,3,4,5]
ghci>
seq
. – Harmsf
yields a result, whileg
yields another one, yet they have the same exact definition? I guess it's a matter of GHC vs GHCi here (object code vs bytecode?), but it looks very subtle to me. – Mycetog
uses an acumulator, builds up the list bottom-to-top, and returns it, so wheng
eventually returns the list, the "tail" of the list is already evaluated, since it was build such way. – Harmsf
probably has been optimized more by the compiler? – Harmsseq
evaluates to WHNF, I think that might be assumption the OP did not make. We don't need seq per se to evaluate already parts of the tree, so some functions can, withseq
get the entire thing grounded. It becomes indeed "fishy" w.r.t. optimizations, since then the behavior is less clear. – Harms(:)
is being used in each context. Inx
the second parameter is being wrapped in an extra thunk, somehow. Different versions of GHC can do things differently. – Catenane