Will this Haskell code take too much memory?
Asked Answered
I

3

9

As in this code:

import Data.Char
groupsOf _ [] = []
groupsOf n xs = 
    take n xs : groupsOf n ( tail xs )

problem_8 x = maximum . map product . groupsOf 5 $ x
main = do t <- readFile "p8.log" 
      let digits = map digitToInt $concat $ lines t
      print $ problem_8 digits

I realize that in Haskell many programs construct and seem to store some intermediate results, such as the groupsOf list in the above code. The above code is the reference answer for Euler project problem 8, taken from Haskell website. The original question ask for the largest product of 5 consecutive digits in a very long chain of digits such as 45343231635723421312443535767983456. So the code will calculate all the products and store it as a list. In other languages I think they will only keep a temporary largest value and discard anything smaller.

So does groupsOf really stores all the intermediate results? what if the problem scales up? Will it allocate too much memory?

Isolationist answered 25/11, 2011 at 23:46 Comment(0)
D
12

No, not because of groupsOf. That one will only need to keep one group in memory at a time. However, maximum might build up a large thunk since it's too lazy, so make sure to either compile with -O or -O2 so that strictness analysis is performed, or use foldl1' max instead1.

1 foldl1' is found in Data.List

Dutiful answered 26/11, 2011 at 0:7 Comment(0)
B
8

I realize that in Haskell many programs construct and seem to store some intermediate results, such as the groupsOf list in the above code.

"seem to" is a good thing to say here, because in all honesty this is where Haskell can really shine. It can, in fact, take up much less memory, without complicated micromanagement on your part, thanks to laziness.

One nifty thing about lazy IO (e.g. readFile) is that it will read only as much of the file as necessary, and the file can be garbage-collected as you consume its contents. (Well, roughly so. It depends on how you set the buffering.)

Here's a rough sketch of how your program will be executed:

t <- readFile "p8.log"

Create a thunk for the t :: String. Probably check that the file exists, and open it in read mode.

 let digits = map digitToInt $concat $ lines t

Create a thunk for digits :: [Int]

 print $ problem_8 digits

Once we attempt to execute this, all of the work will actually get done. We need to fully evaluate problem_8 digits :: Int in order to print it. So we create a thunk for problem_8 digits and force it.

maximum . map product . groupsOf 5 $ digits

At this point, maximum needs the first two elements of map product . groupsOf 5 $ digits in order to see which of the two is greater. Therefore map product needs to see the first two elements of groupsOf 5 digits to pass along.

Now, groupsOf 5 will need the first 5 elements of digits in order to produce its first element. At this point, the first line of the file will probably be read, and go through the defined transformations. groupsOf can construct its first element, and probably its second element (assuming there are more than 6 characters on that line). groupsOf 5 digits passes its first two elements up the chain, we map product onto those two things, and then maximum checks which of the two is larger. We keep the result, the larger of the two.

At this point (in fact somewhat earlier than this point) we can entirely discard the intermediate results. The first two elements of groupOf 5 digits are now entirely unnecessary; we will never need to inspect them again, and the garbage collector can collect them anytime. The first two characters of that file will not be used anymore either, and can be discarded.


Now, this is extremely handwavey and possibly slightly inaccurate. But do you get the idea? Haskell is lazy (well technically Haskell is "non-strict", which is generally implemented via laziness), which makes its execution style much different than any strict language. This allows us to use what appears to be tons of intermediate representations and data structures, without actually taking up tons of memory. Good compilers, such as GHC, can optimize memory usage like you wouldn't believe.

Bushed answered 26/11, 2011 at 7:46 Comment(0)
I
3

I'm not sure what you mean by "intermediate results" here, exactly... But your code looks fine to me. Specifically, your implementation of groupsOf is tail recursive modulo cons, so I don't think you'll have to worry about stack overflow, if that's what you're concerned about.

I might be misunderstanding your question, though. Could you clarify?

Information answered 26/11, 2011 at 0:4 Comment(3)
Actually, tail recursion doesn't really mean anything in Haskell, since all functions, due to laziness, are essentially tail-call optimized for free.Bushed
@Dan Burton - that is not correct. Tail recursion is still very important for recursive functions where you don't want laziness.Parenteral
@stephentetley - yes you're right. But most functions you write in Haskell should be lazy. (e,g, groupsOf is lazy.) For those few functions which you want/need to be strict, yes, tail recursion matters. A strict version of maximum, for example, would benefit from careful use of tail recursion, to avoid building up a large thunk, as hammar alluded to.Bushed

© 2022 - 2024 — McMap. All rights reserved.