Lazy cartesian product in Haskell
Asked Answered
S

2

9

I would like to generate a rather large but finite Cartesian product in Haskell, which I need to then iterate on (think partition function of a mean-field model). The natural thing to do uses sequence, like this:

l = sequence $ replicate n [0,1,2]

Unfortunately, for large n, this does not fit in memory and I run out of heap as soon as I ask for length l for instance. I would need a way to do the same thing lazily. I ended up "rediscovering" base-3 arithmetics, like this,

nextConfig []     = []
nextConfig (0:xs) = 1:xs
nextConfig (1:xs) = 2:xs
nextConfig (2:xs) = 0:(nextConfig xs)

ll = take (3^n) $ iterate nextConfig $ replicate n 0

(which works) but it feels like reinventing the wheel, and besides it is much too specific. What would be a better lazy way to generate the product?

Sentence answered 12/4, 2012 at 11:44 Comment(4)
Do you care about the order of the elements in the result?Unlookedfor
No, as long as there is no repetition.Sentence
How large do you need n to be?Popularly
Something like 20 or 30; I don't really care about computation time for now, but certainly 3^n is beyond RAM size.Sentence
S
8

The more memory-friendly way is obtained by binding in reverse order compared to sequence,

foo 0 _ = [[]]
foo k xs = [h:t | t <- foo (k-1) xs, h <- xs]

It is slower due to less sharing, but since memory is your problem, maybe it's good enough for you.

Spleenwort answered 12/4, 2012 at 12:53 Comment(5)
Cool! I will have to investigate more why it works, but it certainly does. I modified it as foo (l:ls) = [h:t | t <- foo2 ls, h <- l] (who knows if I will always need a cube) and it works as well. Thanks!Sentence
Why are list comprehensions more efficient than do-notation for lists (which is used in sequence)? I can see from the Haskell2010 report both of them desugar to concatMap?Ajit
@brence: See Duncan Coutts's answer to this reddit question: Why are guards in the list comprehension faster than in the do-notation?Popliteal
From there, it appears that list comprehension is desugared into foldr. The weird thing is that a naive foldr approach to cartesian products (at least the one I tried) breaks laziness like sequence does ...Sentence
@danr, a few years ago I changed the definition of concatMap and/or the Monad instance for [], so it should now compile basically the same as the list comprehension version. As I recall, we now define concatMap f xs = [f x | x <- xs] or similar.Karyoplasm
B
0

The pattern for an infinite list is from the top-left corner a1 then a2 down to b1, a3 down to c1

a1 a2 a3 a4
b1 b2 b3 b4
c1 c2 c3 c4
d1 d2 d3 d4

That is, a1, a2 b1, a3 b2 c1, a4 b3 c2 d1 The letters in comma separated groups are a ab abc abcd and the digits 1 21 321 4321

The infinite list output should grow diagonally first with a1 then with 2 added diagonally then 3 added diagonally and so on.

Either of the above single letters or digits must progress in reverse.

diag xs ys = [(a,b)| (m,n) <- zip (inits xs) (inits ys), (a,b) <- zip m $ reverse n ]

With a reverse with every (m,n) pair. I prefer to reverse all at once, one operation repeated. The reverse of an infinite list can, I think, only be accomplished with successive reverse lists.

rsl = tail.scanl (flip (:)) []

take 5 $ rsl [1..]

[[1],[2,1],[3,2,1],[4,3,2,1],[5,4,3,2,1]]

Then handle each list in turn. zip limits the non-reversed list in each iteration.

plr xs ys = [ (a,b) | ls <- rsl xs, (a,b) <- zip ls ys ]

sort.take 10.plr ['a'..] $ [1..]

[('a',1),('a',2),('a',3),('a',4),('b',1),('b',2),('b',3),('c',1),('c',2),('d',1)]

sort, because taking values diagonally means they are not in order like the first linear list above. sort is imported from Data.List

As an aside, because this is traversing diagonally triangular numbers come into play. I used take 10 above, The triangular numbers are:

take 11 [sum ls | ls <- rsl [1..]]

[1,3,6,10,15,21,28,36,45,55,66]
Boddie answered 6/4, 2021 at 17:41 Comment(2)
Hi, some activity on that very old question! but I'm not sure how your reply relates to it, did you perhaps intend it for another question?Sentence
diag is a lazy cartesion product. Try it. As I said, I prefer a lazy function to impliment the reverse lists, that is, lazy rsl. plr is another lasy cartesian product function that uses lazy rsl. If you do not use take with all of these functions they will run forever.Boddie

© 2022 - 2024 — McMap. All rights reserved.