Is the distinction between foldr and foldl important for maps and sets?

Asked 7/12, 2018 at 23:35 Answered 8/12, 2018 at 5:10

(This question applies more generally than to Haskell, but that's the language I'll use to state it.)

The distinction between foldl and foldr seems to depend on the fact that lists are ordered. That is, foldl and foldl collapse a list by applying a function to a starting value and the first or last element, then to the result of the first application and the second or second-to-last element, then the result of the second application to the third or third-to-last element, etc.

But the Data.Set and Data.Map libraries for Haskell define their own versions of foldl and foldr. For instance, for maps they are these:

foldr :: (a -> b -> b) -> b -> Map k a -> b
foldl :: (b -> a -> b) -> b -> Map k a -> b -- I've swapped `a` and `b`
  -- in the second type signature to clarify the correspondence.

Maps and sets aren't ordered. Should I expect performance differences between the versions of foldl and foldr defined for sets and maps, or does foldr f do exactly the same thing as foldl (flip f)?

Fromma answered 7/12, 2018 at 23:35 Comment(4)

In Haskell though foldl can run lazily while foldr is strict. Apart from that there are many different use cases. Such as foldring items into a list could be much more efficient than foldling. – Exon 8/12, 2018 at 0:13

I cannot say anything about the performance difference, but there may be a difference in numerical results - see this question as an example. – Athens 8/12, 2018 at 0:13

@Exon That doesn't seem right -- for one, take 5 (foldr (:) [] [1..]) terminates. What do you mean by "lazy" and "strict"? – Beograd 8/12, 2018 at 13:8

@Beograd Thanks for the head up. Right associative operations yielding strictness here was my false assumtion. Data.List explains this as; Note that, since the head of the resulting expression is produced by an application of the operator to the first element of the list, foldr can produce a terminating expression from an infinite list. – Exon 8/12, 2018 at 13:37

Maps and sets aren't ordered. Should I expect performance differences between the versions of foldl and foldr defined for sets and maps

If you refers the source code of Data.Set or Data.Map, you will find that their elements are organized in binary tree:

data Map k a  = Bin !Size !k a !(Map k a) !(Map k a)
              | Tip 

data Set a    = Bin !Size !a !(Set a) !(Set a)
              | Tip

and the foldr of Set:

foldr f z = go z
  where
    go z' Tip           = z'
    go z' (Bin _ x l r) = go (f x (go z' r)) l

traverse the tree using depth-first search with order right, current, left, so when foldr (+) 0 apply to follow tree:

gives,

4 + (1 + (2 + (3 + 0)))

and foldl

foldl f z = go z
  where
    go z' Tip           = z'
    go z' (Bin _ x l r) = go (f (go z' l) x) r

with order left, current, right when apply foldl (+) 0 to above tree, give:

((((0 + 4) + 1) + 2) + 3)

It show that foldr and foldl of Set equivalent to that apply to the list as:

foldr (+) 0 [4, 1, 2, 3] = 4 + (1 + (2 + (3 + 0)))
foldl (+) 0 [4, 1, 2, 3] = ((((0 + 4) + 1) + 2) + 3)

similar situation as Data.Map and don't repeat here.

Moreover, as we knew, foldr can apply to infinite list (but foldl cannot), for example:

take 10 $ foldr ((:) . sum) [] $ chunksOf 3 [1..] = [6,15,24,33,42,51,60,69,78,87]

(here chunksOf group the list like [[1,2,3], [4,5,6]...])

But how about when the tree has a path is infinite like:

                   1
                  / \
                 4   2
                      \
                       3
                        \
                         ...  <- infinite path

Does the foldr of Set behave as list as mentioned above? (I guess the answer is yes, you can check it for youself)

does foldr f do exactly the same thing as foldl (flip f)?

No, As the source code as shown above:

foldr          = ... go (f x (go z' r)) l

and

foldl (flip f) = ... go (f x (go z' l)) r

The traversing order of tree is different, but the generic relation between foldr and foldl can be found in this post: Defining foldl in terms of foldr

Overreach answered 8/12, 2018 at 5:10 Comment(0)

Actually, sets and maps are ordered. That's why all the set and map functions have the Ord constraint on the key type. They're automatically ordered by the natural ordering of the element type. So if your set contains {3, 5, 2, 1, 4}, then Haskell will see it (for folding purposes) in the order {1, 2, 3, 4, 5}.

But let's forget about that for a minute. Let's assume we're in a perfect world where the data truly is unordered. Even then, the difference between foldl and foldr is significant. Suppose I have the set as before: {3, 5, 2, 1, 4}, and I want to perform some operation .* on it.

foldl (.*) 0 mySet = ((((0 .* 3) .* 5) .* 2) .* 1) .* 4
foldr (.*) 0 mySet = 3 .* (5 .* (2 .* (1 .* (4 .* 0))))

So even if the operation happens to be associative, the initial element is placed on the opposite side with a foldl versus a foldr. In fact, some operations won't even work if implemented using the wrong fold. Consider toList, which is defined in Data.Foldable to work on any Foldable object (including lists, maps, and sets). One possible implementation is

toList :: Foldable t => t a -> [a]
toList = foldr (:) []

If we were to try doing a foldl

definitelyWrong :: Foldable t => t a -> [a]
definitelyWrong = foldl (:) []

Then it doesn't even compile.

wrongfold.hs:5:25: error:
    • Occurs check: cannot construct the infinite type: a ~ [a]
      Expected type: [a] -> [a] -> [a]
        Actual type: a -> [a] -> [a]

This is because the folds associate differently and can work even with accumulation operations which take two differently-typed arguments, which is also apparent from the type signatures of the two

foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b
foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b

Note how the first argument is not a -> a -> a but actually has two different types, which indicates that the ordering definitely does matter.

Diazotize answered 7/12, 2018 at 23:45 Comment(2)

I would argue that the Ord constraint is a consequence of using trees to store and index the data, and not related to the abstract data type itself. A fold over these data structures should only make sense if the function is both associative and commutative, so that the order in which the function is applied to the elements doesn't matter. – Berar 8/12, 2018 at 2:59

Great point about ordering. However, while foldl (:) [] does not compile, foldl (flip (:)) [] does -- congruent with the hypothesis in the question that foldr f == foldl (flip f). – Fromma 8/12, 2018 at 6:12