Why is there a space leak in my haskell program using `transpose`
Asked Answered
J

1

8

I have a space leak in my Haskell programm, I could pinpoint it to a minimal example as follows. I would expect the following code (albeit not terminating but I don't care) to run in constant memory but it doesn't:

head . filter (const False) $ (transpose [[1..]])

while head . filter (const False) $ [1..] does run in constant memory.

So it must have to do with usage of 'transpose' on infinite lists I guess. Ignoring the more complicated base library implementation, if we define transpose like transpose xss = map head xss : transpose (map tail xss) , why is there a space leak ? My assumption was that the garbage collector could free the memory formerly consumed by map head xss in every step of the filter function. I guess that the map tail xss somehow prevents that ?! anyway, could I add some sort of strictness annotation or similar to transpose so that that simple example does run in constant memory?

Jaco answered 14/12, 2023 at 21:9 Comment(2)
A garbage collector isn't a reference counter. It doesn't free memory as soon as humanly possible. It does so when it feels the need to do so, which is usually when its current region is out of memory and it needs more.Enstatite
@SilvioMayolo I think this comment is too flippant. Compiling main = print . <code in question> with -O2 -rtsopts and running with +RTS -M1g to set the maximum heap size to 1 gig (which surely ought to be enough if the GC can collect something) results in the process dying from allocating too much.Maidel
V
7

In your program, the list produced by transpose [[1..]] using the simplified version of transpose looks like this after a few iterations:

map head [[1..]]
  : map head (map tail [[1..]]) 
  : map head (map tail (map tail [[1..]])) 
  : map head (map tail (map tail (map tail [[1..]]))) 
  : transpose (map tail (map tail (map tail (map tail [[1..]]))))

Since your filter function is only forcing the spine of this list, even when those initial values are tossed, you're still growing a infinite chain of nested map tail thunks.

In your example, it should be enough to force the spines of the inner lists, since this will resolve the nested map tail thunks.

For example, this runs in constant space using the transpose from Data.List because the length call forces the spine of each inner list. (It doesn't work with your simplified transpose, though. See @Ben's comment on this answer for an explanation why.)

main =
  print $ head . filter (\x -> length x < 0) $ transpose [[1..]]

For a more explicit solution, the following seems to work fine. The forcelist2 function ensures that every (:) that's forced in the outer list forces the head element, and the forcelist function ensures that the inner lists are forced to the end, where the final [] can be resolved.

import Data.List

transpose' :: [[a]] -> [[a]]
transpose' = forcelist2 . transpose

forcelist2 :: [[a]] -> [[a]]
forcelist2 (x:xs) = let !y = forcelist x in y : forcelist2 xs
forcelist :: [a] -> [a]
forcelist (x:xs) = let !ys = forcelist xs in x : ys
forcelist [] = []

main = print $ head . filter (const False) . forcelist2 $ transpose' [[1..]]
Vander answered 14/12, 2023 at 23:59 Comment(2)
I think forcing the inner spines doesn't work with the simplified transpose because all those [[1..]] expressions in your unfolding are references to the same lazy value. Forcing the spine of map head [[1..]] evaluates it to a list with a known number of elements (one), but the element is a thunk for head [1..], which has an unevaluated reference to [1..]. But when you force the spine of the second list, we end up needing to force tail [1..], which forces that [1..] thunk to be unrolled to something like 1 : [2..], so that tail can drop the 1. (1/2)Camaraderie
Forcing the spine of the third list requires forcing tail (tail [1..]), which expands the [[1..]] to 1 : 2 : [3..], and so on. But the original inner list is still holding onto a reference to the entire infinite list [1..], and all the [1..] are shared references to a single value. So its head elements aren't able to be garbage collected as the various tail calls skip over them, and that head [1..] thunk is a live reference to an infinite list that is actually being unrolled into memory. (2/2)Camaraderie

© 2022 - 2024 — McMap. All rights reserved.