How can I fuse two maps over the same list?
Asked Answered
F

2

14

We could fuse two traversals over the list xs in the expression

(map f xs, map g xs)

like so

unzip (map (\x -> (f x, g x)) xs)

Is there any reasearch on performing this kind of fusion automatically?

(There's a risk to create a space leak here if one of the returned lists is consumed before the other. I'm more interested in preventing the extra traversal over xs than saving space.)

Edit: I'm actually not looking to apply the fusion to actual in-memory Haskell lists, where this transformation might not make sense depending on if the unzip can be fused with its consumer(s). I have a setting where I know unzip can fuse (see "FlumeJava: easy, efficient data-parallel pipelines").

Firstrate answered 3/9, 2012 at 4:29 Comment(6)
Not automatic, but quite nice anyway: squing.blogspot.com/2008/11/beautiful-folding.htmlHumoral
Unless the result of this fuses with something else the overhead of creating the pairs and unzipping them will be bigger than cost of the extra traversal.Essentialism
@Essentialism Not if the traversal is over a huge file! I'm not planning to apply this to actual lists.Firstrate
The unzip will traverse a list of the same length as the map, so you will not be saving traversals. Now if you care about saving the space that the fusion gives you then it can make a huge difference. I inferred from your comment that you were not interested in space, but I see that's not exactly what you said. :)Essentialism
" preventing the extra traversal over xs " can also be accomplished by most of the iterator-style packages.Faught
@Essentialism I have a consumer that can fuse with the unzip. I'm trying to cast sibling fusion, as described in "FlumeJava: easy, efficient data-parallel pipelines" (pages.cs.wisc.edu/~akella/CS838/F12/838.../FlumeJava.pdf), in one of the Haskell fusion frameworks.Firstrate
B
4

Also not fully automatic, but you can give GHC a list of rewrite rules like that. See 7.14 Rewrite rules and Using rules. Then the compiler uses these rules to optimize your program when compiling. (Note that the compiler in no way checks if the rules make any sense.)

Edit: To give an example for this particular problem, we can write:

{-# OPTIONS_GHC -fenable-rewrite-rules -ddump-rule-firings -ddump-rule-rewrites #-}

import Data.Char

{-# RULES
"map/zip" forall f g xs. (,) (map f xs) (map g xs) = unzip (map (\x -> (f x, g x)) xs)
   #-}

main :: IO ()
main = let x = "abCD" in
        print $ (,) (map toUpper x) (map toLower x)

(the top-level function name in the rule is (,) :: a -> b -> (a, b)). When compiling, you'll see how the rules are applied. Option dump-rule-firings shows a message when a rule is applied and -ddump-rule-rewrites displays each rule application in detail - see 7.14.6. Controlling what's going on in rewrite rules.

Balmuth answered 3/9, 2012 at 7:35 Comment(1)
I don't think we can write a rule to match these kind of expressions. GHC rules must start with a function name.Firstrate
F
3

I've managed to find two resources that mentions fusion (un-)zip like functions, at least briefly:

Josef Svenningsson. "Shortcut Fusion for Accumulating Parameters & Zip-like Functions" http://www.cse.chalmers.se/~josefs/publications/fusion.pdf

Duncan Coutts. "Stream Fusion: Practical shortcut fusion for coinductive sequence types" https://community.haskell.org/~duncan/thesis.pdf

Neither resources mentions this kind of "sibling fusion" explicitly though.

Firstrate answered 3/9, 2012 at 17:10 Comment(2)
I didn't see this presentation, but here are Josef's slides about TupleFusion.Shelbashelbi
Towards an automated tupling strategy might be interesting.Sosthenna

© 2022 - 2024 — McMap. All rights reserved.