Irrefutable pattern does not leak memory in recursion, but why?
Asked Answered
D

1

14

The mapAndSum function in the code block all the way below combines map and sum (never mind that another sum is applied in the main function, it just serves to make the output compact). The map is computed lazily, while the sum is computed using an accumulating parameter. The idea is that the result of map can be consumed without ever having the complete list in memory, and (only) afterwards the sum is available "for free". The main function indicates that we had a problem with irrefutable patterns when calling mapAndSum. Let me explain this problem.

According to the Haskell standard, the irrefutable pattern example let (xs, s) = mapAndSum ... in print xs >> print s is translated into

(\ v ->    print (case v of { (xs, s) -> xs })
        >> print (case v of { (xs, s) -> s }))
$ mapAndSum ...

And hence, both print calls carry a reference to the whole pair, which leads to the whole list being kept in memory.

We (my colleague Toni Dietze and I) solved this using an explicit case statement (compare "bad" vs "good2"). BTW, finding this out took us a considerable amount of time..!

Now, what we do not understand is two-fold:

  • Why does mapAndSum work in the first place? It also uses an irrefutable pattern, so it should also keep the whole list in memory, but it obviously doesn't. And converting the let into a case would make the function behave completely unlazy (to the point that the stack overflows; no pun intended).

    We had a look at the "core" code generated by GHC, but as far as we could interpret it, it actually contained the same translation of let as the one above. So no clue here, and more confusion instead.

  • Why does "bad?" work when optimization is turned off, but not when it is turned on?

One remark concerning our actual application: we want to achieve stream processing (format conversion) of a large text file while also accumulating some value that is then written to a separate file. As indicated, we were successful, but the two questions remain, and answers may improve our understanding of GHC for upcoming tasks and problems.

Thank you!

{-# LANGUAGE BangPatterns #-}

-- Tested with The Glorious Glasgow Haskell Compilation System, version 7.4.2.

module Main where


import Control.Arrow (first)
import Data.List (foldl')
import System.Environment (getArgs)


mapAndSum :: Num a => (a -> b) -> [a] -> ([b], a)
mapAndSum f = go 0
  where go !s (x : xs) = let (xs', s') = go (s + x) xs in (f x : xs', s')
                       -- ^ I have no idea why it works here. (TD)
        go !s []       = ([], s)


main :: IO ()
main = do
  let foo  = mapAndSum (^ (2 :: Integer)) [1 .. 1000000 :: Integer]
  let sum' = foldl' (+) 0
  args <- getArgs
  case args of
    ["bad" ]  -> let (xs, s) = foo in print (sum xs) >> print s
    ["bad?"]  -> print $ first sum' $ foo
              -- ^ Without ghc's optimizer, this version is as memory
              -- efficient as the “good” versions
              -- With optimization “bad?” is as bad as “bad”. Why? (TD)
    ["good1"] -> print $ first' sum' $ foo
                   where first' g (x, y) = (g x, y)
    ["good2"] -> case foo of
                    (xs, s) -> print (sum' xs) >> print s
    ["good3"] -> (\ (xs, s) -> print (sum' xs) >> print s) $ foo
    _ -> error "Sorry, I do not understand."
Dijon answered 24/7, 2012 at 7:11 Comment(5)
Your mapAnsSum function might be improved by using mapAccumL from Data.Traversable.Lucite
I assume you mean “ghc’s optimizer” in the comment, correct?Ruella
Yes, sorry, we mean ghc's optimizer.Dijon
Perhaps I am missing something, but you do not seem to be using any irrefutable patterns.Opuscule
This document details how let expressions are converted into case with an irrefutable pattern: haskell.org/onlinereport/haskell2010/… Summary: "Pattern bindings are matched lazily; an implicit ~ makes these patterns irrefutable"Dijon
R
18

Let me first answer why mapAndSome can work good at all: What you see is (very likely) the effect of an optimization described by Philip Wadler in “Fixing some space leaks with a garbage collector”. Short summary: If the garbage collector sees a thunk of the form fst x and x is already evaluated to the tuple constructor, e.g. (y,z), it will replace fst x by y, possibly freeing up z if it is not referenced anywhere else.

In your code, the s' will, once the result of go is evaluated to a tuple and after one round of GCing, not keep a reference to the tuple but will be replaced by the accumulated parameter.

Now let’s look a the other patterns by investigating core. The foo binding is compiled to:

foo_r2eT :: ([Type.Integer], Type.Integer)

foo_r2eT =
  case $wgo_r2eP mapAndSum1 lvl2_r2eS
  of _ { (# ww1_s2d7, ww2_s2d8 #) ->
  (ww1_s2d7, ww2_s2d8)
  }

Here is the code in the "bad" case (lvl18_r2fd is "bad"):

case eqString ds_dyA lvl18_r2fd of _ {
  False -> $wa_s2da new_s_a14o;
  True ->
    case ds1_dyB of _ {
      [] ->
        case Handle.Text.hPutStr2
               Handle.FD.stdout lvl17_r2fc True new_s_a14o
        of _ { (# new_s1_X15h, _ #) ->
        Handle.Text.hPutStr2
          Handle.FD.stdout lvl16_r2fb True new_s1_X15h
        };
      : ipv_sIs ipv1_sIt -> $wa_s2da new_s_a14o
    }

We can see that two values at the module level are printed, lvl17_r2fc and lvl16_r2fb, here is their code:

lvl17_r2fc :: String
[GblId]
lvl17_r2fc =
  case foo_r2eT of _ { (xs_Xqp, s_Xq9) ->
  $w$cshowsPrec
    0
    (Data.List.sum_sum' xs_Xqp Data.List.genericDrop2)
    ([] @ Char)
  }

lvl16_r2fb :: String
[GblId]
lvl16_r2fb =
  case foo_r2eT of _ { (xs_apS, s_Xqp) ->
  $w$cshowsPrec 0 s_Xqp ([] @ Char)
  }

Why are they bound at the module level, and not inside the expression? This is an effect of lazy lifting, another optimization that increases sharing and hence sometime has an adverse effect on space performance. See GHC ticket 719 for another occurrence of this effect.

So what happens is that the evaluation of lvl17_r2fc causes foo to be evaluated, and the left entry lazily printed. Unfortunately, lvl16_r2fb is still live and retains the full tuple. And because the garbage collector (seems) to fail to see that this is an selector thunk, Wadler’s optimization does not kick in.

In contrast, here is the code for "good1" a.k.a. lvl8_r2f1:

            case eqString ds_dyA lvl8_r2f1 of _ {
              False -> $wa2_s2dI w3_s2cF;
              True ->
                case ds1_dyB of _ {
                  [] ->
                    Handle.Text.hPutStr2
                      Handle.FD.stdout lvl7_r2f0 True w3_s2cF;
                  : ipv_sHg ipv1_sHh -> $wa2_s2dI w3_s2cF
                }
            } } in

where the printed value is this string:

lvl7_r2f0 :: String
[GblId]
lvl7_r2f0 =
  case foo_r2eT of _ { (x_af6, y_af7) ->
  show_tuple
    (:
       @ ShowS
       (let {
          w2_a2bY [Dmd=Just L] :: Type.Integer

          w2_a2bY = lgo_r2eU mapAndSum1 x_af6 } in
        \ (w3_a2bZ :: String) ->
          $w$cshowsPrec 0 w2_a2bY w3_a2bZ)
       (:
          @ ShowS
          (\ (w2_a2bZ :: String) ->
             $w$cshowsPrec 0 y_af7 w2_a2bZ)
          ([] @ ShowS)))
    ([] @ Char)
  }

As you can see the tuple is taken apart only once, and both values are used. So nothing refers to the tuple as a whole and it can be garbage collected. Similar for "good2" and "good3".

Now to "bad?": In the unoptimized case, we get this code:

 case eqString ds_dyA (unpackCString# "bad?")
                 of _ {
                   False -> fail2_dyN realWorld#;
                   True ->
                     case ds1_dyB of _ {
                       [] ->
                         $
                           @ (Type.Integer, Type.Integer)
                           @ (IO ())
                           (System.IO.print
                              @ (Type.Integer, Type.Integer) $dShow_rzk)
                           ($
                              @ ([Type.Integer], Type.Integer)
                              @ (Type.Integer, Type.Integer)
                              (Control.Arrow.first
                                 @ (->)
                                 Control.Arrow.$fArrow(->)
                                 @ [Type.Integer]
                                 @ Type.Integer
                                 @ Type.Integer
                                 sum'_rzm)
                              foo_rzl);
                       : ipv_szd ipv1_sze -> fail2_dyN realWorld#
                     }
                 } } in

The implementation of first via *** uses refutable patterns, so the kind of selector thunks that the garbage collector handles well are generated.

In the optimized case, things are a bit scattered, but anyways here is the relevant code (the last value is the one that is being printed):

w_r2f2 :: Type.Integer

w_r2f2 =
  case foo_r2eT of _ { (x_aI1, y_aI2) ->
  lgo_r2eU mapAndSum1 x_aI1
  }

lvl9_r2f3 :: String -> String
[GblId, Arity=1]
lvl9_r2f3 =
  \ (w2_a2bZ :: String) ->
    $w$cshowsPrec 0 w_r2f2 w2_a2bZ

w1_r2f4 :: Type.Integer

w1_r2f4 = case foo_r2eT of _ { (x_aI6, y_aI7) -> y_aI7 }

lvl10_r2f5 :: String -> String
[GblId, Arity=1]
lvl10_r2f5 =
  \ (w2_a2bZ :: String) ->
    $w$cshowsPrec 0 w1_r2f4 w2_a2bZ

lvl11_r2f6 :: [ShowS]
[GblId]
lvl11_r2f6 =
  :
    @ ShowS lvl10_r2f5 ([] @ ShowS)

lvl12_r2f7 :: [ShowS]
[GblId]
lvl12_r2f7 = : @ ShowS lvl9_r2f3 lvl11_r2f6

lvl13_r2f8 :: ShowS
[GblId]
lvl13_r2f8 = show_tuple lvl12_r2f7

lvl14_r2f9 :: String
[GblId]
lvl14_r2f9 = lvl13_r2f8 ([] @ Char)

The use of first has been inlined. We see two calls to case foo_r2eT, so this is prone for a space leak, despite w1_rf24 looking like a selector (so I’d expect the runtime to apply Wadler’s optimization). Maybe it does not work well for static thunks? Indeed the commentary, if up to date, only talks about dynamic allocated selector thunks. So if your foo were not a module-level-value (or rather lazy-liftable into one) but rather dependent on some input, w1_rf24 might be dynamically allocated and hence eligible for the special treatment. But then the code might look very different anyways.

Ruella answered 24/7, 2012 at 8:10 Comment(14)
If I am getting you right, we are looking at an implementation issue rather than a language issue? In other words I could not find anything about my problem in the Haskell report or the core output, and with another Haskell implementation all the variants we have in the main function may behave like "bad"? In still other words, our performance critical program sits on very thin ice?Dijon
Well, the Haskell report makes no statement about space behavior, so you are lost there anyways. And yes, other implementations of Haskell might perform bad in each of the cases.Ruella
Okay, then is there a way of "thickening the ice" at least a little? Can we make the idea of streaming while accumulating more explicit? We tried something with the ST monad, and it worked in principle, but (if I remember correctly, and surprisingly) it had the same space behaviour as "bad"...Dijon
I believe that this is what things like the pipe library are about, but I’m not an expert on this.Ruella
The situation is even worse as the compiler is sometimes unable to apply the optimization that Wadler suggested, for example see this thread haskell.org/pipermail/glasgow-haskell-users/2010-October/…. There are cases where other optimizations are applied first and the compiler is unable to recognize that the program projects to one component of a constructor.Lipoprotein
Wow, Joachim, your answer deserves to be upvoted twice (unfortunately, I cannot do this)!Dijon
All good, except it was not invented by Phil Wadler. It was documented by Phil, but invented independently by at least two people before Phil wrote that paper.Glioma
Oh, thanks for the correction. Luckily I can edit my academic impudence here :-)Ruella
This whole business is an indication to me that Herb Sutter might actually be right when he says (or rather quotes?) that C++ may not be the easiest of languages, but writing truly efficient code is easiest in C++. But worry not, I am still on the Haskell train.Dijon
What I mean is, if you want to make your Haskell program really efficient, you need to understand the same concepts as the C programmer plus a lot more.Dijon
Also, doesn't ghc use a variation of what's descibed in Jan Sparud's "Fixing Some Space Leaks without a Garbage Collector"? citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.5423Glioma
@JoachimBreitner I happen to know Phil didn't invent it, since I told him about it. :)Glioma
Maybe as well, but it certainly shortcurts selector thunks in the garbage collecotor: hackage.haskell.org/trac/ghc/browser/rts/sm/Evac.c#L1031Ruella
@JoachimBreitner Yes, you're right. GHC went with the GC variant.Glioma

© 2022 - 2024 — McMap. All rights reserved.