The mapAndSum
function in the code block all the way below combines map
and sum
(never mind that another sum
is applied in the main function, it just serves to make the output compact). The map
is computed lazily, while the sum
is computed using an accumulating parameter. The idea is that the result of map
can be consumed without ever having the complete list in memory, and (only) afterwards the sum
is available "for free". The main function indicates that we had a problem with irrefutable patterns when calling mapAndSum
. Let me explain this problem.
According to the Haskell standard, the irrefutable pattern example let (xs, s) = mapAndSum ... in print xs >> print s
is translated into
(\ v -> print (case v of { (xs, s) -> xs })
>> print (case v of { (xs, s) -> s }))
$ mapAndSum ...
And hence, both print
calls carry a reference to the whole pair, which leads to the whole list being kept in memory.
We (my colleague Toni Dietze and I) solved this using an explicit case
statement (compare "bad" vs "good2"). BTW, finding this out took us a considerable amount of time..!
Now, what we do not understand is two-fold:
Why does
mapAndSum
work in the first place? It also uses an irrefutable pattern, so it should also keep the whole list in memory, but it obviously doesn't. And converting thelet
into acase
would make the function behave completely unlazy (to the point that the stack overflows; no pun intended).We had a look at the "core" code generated by GHC, but as far as we could interpret it, it actually contained the same translation of
let
as the one above. So no clue here, and more confusion instead.Why does "bad?" work when optimization is turned off, but not when it is turned on?
One remark concerning our actual application: we want to achieve stream processing (format conversion) of a large text file while also accumulating some value that is then written to a separate file. As indicated, we were successful, but the two questions remain, and answers may improve our understanding of GHC for upcoming tasks and problems.
Thank you!
{-# LANGUAGE BangPatterns #-}
-- Tested with The Glorious Glasgow Haskell Compilation System, version 7.4.2.
module Main where
import Control.Arrow (first)
import Data.List (foldl')
import System.Environment (getArgs)
mapAndSum :: Num a => (a -> b) -> [a] -> ([b], a)
mapAndSum f = go 0
where go !s (x : xs) = let (xs', s') = go (s + x) xs in (f x : xs', s')
-- ^ I have no idea why it works here. (TD)
go !s [] = ([], s)
main :: IO ()
main = do
let foo = mapAndSum (^ (2 :: Integer)) [1 .. 1000000 :: Integer]
let sum' = foldl' (+) 0
args <- getArgs
case args of
["bad" ] -> let (xs, s) = foo in print (sum xs) >> print s
["bad?"] -> print $ first sum' $ foo
-- ^ Without ghc's optimizer, this version is as memory
-- efficient as the “good” versions
-- With optimization “bad?” is as bad as “bad”. Why? (TD)
["good1"] -> print $ first' sum' $ foo
where first' g (x, y) = (g x, y)
["good2"] -> case foo of
(xs, s) -> print (sum' xs) >> print s
["good3"] -> (\ (xs, s) -> print (sum' xs) >> print s) $ foo
_ -> error "Sorry, I do not understand."
mapAnsSum
function might be improved by usingmapAccumL
from Data.Traversable. – Lucite