`seq` apparently does or doesn't force evaluation of entire recursively-defined list depending on how it is loaded into GHCi
Asked Answered
S

1

7

Recently I've been trying to understand what, exactly, is forced by GHC upon the evaluation of seq. Suppose I save the below definitions:

f :: Int -> [Int] -> [Int]
f = \n -> \ns -> if n <= 0 then ns else f (n - 1) (n : ns)

x = f 5 []

and then :load them into a GHCi session.

In that session, the following behavior is observed:

ghci> let g :: Int -> [Int] -> [Int]; g = \n -> \ns -> if n <= 0 then ns else g (n - 1) (n : ns)
ghci> let y = g 5 []
ghci> seq x ()
()
ghci> seq y ()
()
ghci> :sprint x
x = 1 : _
ghci> :sprint y
y = [1,2,3,4,5]

Simply put, why are x and y evaluated differently despite being defined and called identically? And which reflects what GHC actually does?

(I'm inclined toward the result for y, since in evaluating x and y to WHNF all of their tail entries are in the process evaluated down to literals, but maybe I'm wrong.)

Sherrell answered 18/9 at 19:41 Comment(8)
It indeed doesn't. It evaluates to WHNF, so the "outside constructor" so to speak. Typically you make a recursive seq.Harms
@willeM_VanOnsem But why f yields a result, while g yields another one, yet they have the same exact definition? I guess it's a matter of GHC vs GHCi here (object code vs bytecode?), but it looks very subtle to me.Myceto
@chi: I think it is because g uses an acumulator, builds up the list bottom-to-top, and returns it, so when g eventually returns the list, the "tail" of the list is already evaluated, since it was build such way.Harms
f probably has been optimized more by the compiler?Harms
It's probably something like that. But it can't be simply explained with "seq only evaluates to WHNF, it doesn't force the whole list". It's way more complex than that in my eye. It's potentially ties to optimization as you say, or different runtime system (GHC compiles to object code, GHCi also uses bytecode IIRC).Myceto
@chi: well seq evaluates to WHNF, I think that might be assumption the OP did not make. We don't need seq per se to evaluate already parts of the tree, so some functions can, with seq get the entire thing grounded. It becomes indeed "fishy" w.r.t. optimizations, since then the behavior is less clear.Harms
First off: Are you loading the module compiled or interpreted? That can make a difference, and is worth mentioning. Second: what version of GHC are you using? This is going to come down to subtle details in how (:) is being used in each context. In x the second parameter is being wrapped in an extra thunk, somehow. Different versions of GHC can do things differently.Catenane
@Catenane That would make sense. The module is being loaded interpreted, using GHCi version 9.4.8.Sherrell
S
9

When you :load code into GHCi, by default GHCi instruments it with breakpoints. The breakpoints are inactive by default, but they must be present in the code for you to activate them later if you choose to use GHCi's debugger. Code that you type into the GHCi prompt, in contrast, does not get breakpoints sprinkled into it (as a consequence, you cannot set breakpoints inside code you type at the prompt).

Breakpoints show up in Core, GHC's high-level IR. You can see Core by passing -ddump-simpl to GHC or GHCi. Here is what f looks like when I :load it in from a file (you can see I called it Lib.hs) while running ghci -ddump-simpl.

Rec {
-- RHS size: {terms: 20, types: 7, coercions: 0, joins: 0/0}
f [Occ=LoopBreaker] :: Int -> [Int] -> [Int]
[GblId]
f = break<Lib,6>()
    \ (n_aBW :: Int) ->
      break<Lib,5>(n_aBW)
      \ (ns_aBX :: [Int]) ->
        break<Lib,4>(n_aBW,ns_aBX)
        case break<Lib,0>(n_aBW)
             <= @Int GHC.Classes.$fOrdInt n_aBW (GHC.Types.I# 0#)
        of {
          False ->
            break<Lib,3>(n_aBW,ns_aBX)
            f (break<Lib,1>(n_aBW)
               - @Int GHC.Internal.Num.$fNumInt n_aBW (GHC.Types.I# 1#))
              (break<Lib,2>(n_aBW,ns_aBX) GHC.Types.: @Int n_aBW ns_aBX);
          True -> ns_aBX
        }
end Rec }

Compare this to what g turns into when I type it into the same session. (f also looks like this if you compile it to object code with ghc -ddump-simpl):

letrec {
  g_aTj [Occ=LoopBreaker]
    :: GHC.Types.Int -> [GHC.Types.Int] -> [GHC.Types.Int]
  [LclId,
   Arity=2,
   Unf=Unf{Src=<vanilla>, TopLvl=False,
           Value=True, ConLike=True, WorkFree=True, Expandable=True,
           Guidance=IF_ARGS [0 0] 160 0}]
  g_aTj
    = \ (n_aTk :: GHC.Types.Int) (ns_aTl :: [GHC.Types.Int]) ->
        case GHC.Classes.<=
               @GHC.Types.Int GHC.Classes.$fOrdInt n_aTk (GHC.Types.I# 0#)
        of {
          GHC.Types.False ->
            g_aTj
              (GHC.Internal.Num.-
                 @GHC.Types.Int GHC.Internal.Num.$fNumInt n_aTk (GHC.Types.I# 1#))
              (GHC.Types.: @GHC.Types.Int n_aTk ns_aTl);
          GHC.Types.True -> ns_aTl
        }; } in
-- ... some GHCi prompt stuff under the let

The salient difference is that (:) when called in f appears with a break over it, while in g there is no break. When no breakpoints are set, each break just acts like an identity function: break x = x (I will ignore the <Module, BreakpointNumber> part that identifies each break). Ignoring all the other breakpoints, f-with-breakpoints acts like it was written

f n ns = if n <= 0 then ns else f (n - 1) (break (n : ns))

In particular

x = f 5 []
= f 4 (break (5 : []))
= f 3 (break (4 : break (5 : [])))
= f 2 (break (3 : break (4 : break (5 : []))))
= f 1 (break (2 : break (3 : break (4 : break (5 : [])))))
= f 0 (break (1 : break (2 : break (3 : break (4 : break (5 : []))))))
= break (1 : break (2 : break (3 : break (4 : break (5 : [])))))
= 1 : break (2 : break (3 : break (4 : break (5 : [])))) -- reached WHNF!

Note that once x has reached WHNF, the tail is still pointing to an unevaluated break thunk. In contrast, taking y to WHNF can give the fully evaluated result 1 : 2 : 3 : 4 : 5 : [] since there are no breaks in g.

Summarizing: :sprint x prints 1 : _ because the _ represents a value that might need to trigger a breakpoint but has not gotten the chance to check if it should yet. Breakpoints are not supported for things defined at the prompt like y, so you can see the whole value immediately. You can also turn breakpoints off for :loaded code by passing -fno-break-points, which eliminates the difference between x and y.

$ ghci -fno-break-points
GHCi, version 9.10.1: https://www.haskell.org/ghc/  :? for help
ghci> :load Lib.hs
[1 of 1] Compiling Lib              ( Lib.hs, interpreted )
Ok, one module loaded.
ghci> seq x ()
()
ghci> :sprint x
x = [1,2,3,4,5]
ghci>
Stephniestepladder answered 19/9 at 20:57 Comment(1)
This makes perfect sense, thanks!Sherrell

© 2022 - 2024 — McMap. All rights reserved.