Implement zip using foldr
Asked Answered
D

7

23

I'm currently on chapter 4 of Real World Haskell, and I'm trying to wrap my head around implementing foldl in terms of foldr.

(Here's their code:)

myFoldl :: (a -> b -> a) -> a -> [b] -> a

myFoldl f z xs = foldr step id xs z
    where step x g a = g (f a x)

I thought I'd try to implement zip using the same technique, but I don't seem to be making any progress. Is it even possible?

Diploid answered 24/10, 2008 at 20:27 Comment(0)
J
19
zip2 xs ys = foldr step done xs ys
  where done ys = []
        step x zipsfn []     = []
        step x zipsfn (y:ys) = (x, y) : (zipsfn ys)

How this works: (foldr step done xs) returns a function that consumes ys; so we go down the xs list building up a nested composition of functions that will each be applied to the corresponding part of ys.

How to come up with it: I started with the general idea (from similar examples seen before), wrote

zip2 xs ys = foldr step done xs ys

then filled in each of the following lines in turn with what it had to be to make the types and values come out right. It was easiest to consider the simplest cases first before the harder ones.

The first line could be written more simply as

zip2 = foldr step done

as mattiast showed.

Jersey answered 24/10, 2008 at 20:47 Comment(8)
you are evil... you're not saying it's diong (foldr step done xs) and then applying that to ys?Nessy
It's the same algorithm as mattiast's (who posted 4 seconds quicker). Right, (foldr step done xs) returns a function that consumes ys; so we go down the xs list building up a nested composition of functions that will each be applied to the corresponding part of ys.Jersey
But it's easier to think of it equationally. I started with the first line and then filled in each of the rest in turn with what it had to be to make the types and values come out right.Jersey
Why don't you add those explanations to the answer and reformat it using where or let, so I can accept it?Diploid
OK! I'm kind of new to this stackoverflow thing.Jersey
How does the step function end up taking 3 arguments? As far as I understand step only ever takes 2 (the accumulator and the next element of the list). What is the third paramater used for here?Catabolite
foldr only calls it with 2 arguments, right; so the result of the call from foldr is a new function of one argument. That function is finally applied to ys in the top-level expression defining zip2. (In Haskell, a definition like f x y z = u means the same as f = \x -> \y -> \z -> u)Jersey
In link‎ there is a paper that proves how one can write a function in terms of foldr. Can you please, using the same aproach give a proof of the above function?Kerikeriann
T
13

The answer had already been given here, but not an (illustrative) derivation. So even after all these years, perhaps it's worth adding it.

It is actually quite simple. First,

foldr f z xs 
   = foldr f z [x1,x2,x3,...,xn] = f x1 (foldr f z [x2,x3,...,xn])
   = ... = f x1 (f x2 (f x3 (... (f xn z) ...)))

hence by eta-expansion,

foldr f z xs ys
   = foldr f z [x1,x2,x3,...,xn] ys = f x1 (foldr f z [x2,x3,...,xn]) ys
   = ... = f x1 (f x2 (f x3 (... (f xn z) ...))) ys

As is apparent here, if f is non-forcing in its 2nd argument, it gets to work first on x1 and ys, f x1r1ys where r1 =(f x2 (f x3 (... (f xn z) ...)))= foldr f z [x2,x3,...,xn].

So, using

f x1 r1 [] = []
f x1 r1 (y1:ys1) = (x1,y1) : r1 ys1

we arrange for passage of information left-to-right along the list, by calling r1 with the rest of the input list ys1, foldr f z [x2,x3,...,xn]ys1 = f x2r2ys1, as the next step. And that's that.


When ys is shorter than xs (or the same length), the [] case for f fires and the processing stops. But if ys is longer than xs then f's [] case won't fire and we'll get to the final f xnz(yn:ysn) application,

f xn z (yn:ysn) = (xn,yn) : z ysn

Since we've reached the end of xs, the zip processing must stop:

z _ = []

And this means the definition z = const [] should be used:

zip xs ys = foldr f (const []) xs ys
  where
    f x r []     = []
    f x r (y:ys) = (x,y) : r ys

From the standpoint of f, r plays the role of a success continuation, which f calls when the processing is to continue, after having emitted the pair (x,y).

So r is "what is done with more ys when there are more xs", and z = const [], the nil-case in foldr, is "what is done with ys when there are no more xs". Or f can stop by itself, returning [] when ys is exhausted.


Notice how ys is used as a kind of accumulating value, which is passed from left to right along the list xs, from one invocation of f to the next ("accumulating" step being, here, stripping a head element from it).

Naturally this corresponds to the left fold, where an accumulating step is "applying the function", with z = id returning the final accumulated value when "there are no more xs":

foldl f a xs =~ foldr (\x r a-> r (f a x)) id xs a

Similarly, for finite lists,

foldr f a xs =~ foldl (\r x a-> r (f x a)) id xs a

And since the combining function gets to decide whether to continue or not, it is now possible to have left fold that can stop early:

foldlWhile t f a xs = foldr cons id xs a
  where 
    cons x r a = if t x then r (f a x) else a

or a skipping left fold, foldlWhen t ..., with

    cons x r a = if t x then r (f a x) else r a

etc.

Townswoman answered 9/10, 2014 at 18:4 Comment(0)
C
10

I found a way using quite similar method to yours:

myzip = foldr step (const []) :: [a] -> [b] -> [(a,b)]
    where step a f (b:bs) = (a,b):(f bs)
          step a f [] = []
Costanzia answered 24/10, 2008 at 20:47 Comment(0)
N
7

For the non-native Haskellers here, I've written a Scheme version of this algorithm to make it clearer what's actually happening:

> (define (zip lista listb)
    ((foldr (lambda (el func)
           (lambda (a)
             (if (empty? a)
                 empty
                 (cons (cons el (first a)) (func (rest a))))))
         (lambda (a) empty)
         lista) listb))
> (zip '(1 2 3 4) '(5 6 7 8))
(list (cons 1 5) (cons 2 6) (cons 3 7) (cons 4 8))

The foldr results in a function which, when applied to a list, will return the zip of the list folded over with the list given to the function. The Haskell hides the inner lambda because of lazy evaluation.


To break it down further:

Take zip on input: '(1 2 3) The foldr func gets called with

el->3, func->(lambda (a) empty)

This expands to:

(lambda (a) (cons (cons el (first a)) (func (rest a))))
(lambda (a) (cons (cons 3 (first a)) ((lambda (a) empty) (rest a))))

If we were to return this now, we'd have a function which takes a list of one element and returns the pair (3 element):

> (define f (lambda (a) (cons (cons 3 (first a)) ((lambda (a) empty) (rest a)))))
> (f (list 9))
(list (cons 3 9))

Continuing, foldr now calls func with

el->3, func->f ;using f for shorthand
(lambda (a) (cons (cons el (first a)) (func (rest a))))
(lambda (a) (cons (cons 2 (first a)) (f (rest a))))

This is a func which takes a list with two elements, now, and zips them with (list 2 3):

> (define g (lambda (a) (cons (cons 2 (first a)) (f (rest a)))))
> (g (list 9 1))
(list (cons 2 9) (cons 3 1))

What's happening?

(lambda (a) (cons (cons 2 (first a)) (f (rest a))))

a, in this case, is (list 9 1)

(cons (cons 2 (first (list 9 1))) (f (rest (list 9 1))))
(cons (cons 2 9) (f (list 1)))

And, as you recall, f zips its argument with 3.

And this continues etc...

Nessy answered 24/10, 2008 at 21:0 Comment(0)
D
7

The problem with all these solutions for zip is that they only fold over one list or the other, which can be a problem if both of them are "good producers", in the parlance of list fusion. What you actually need is a solution that folds over both lists. Fortunately, there is a paper about exactly that, called "Coroutining Folds with Hyperfunctions".

You need an auxiliary type, a hyperfunction, which is basically a function that takes another hyperfunction as its argument.

newtype H a b = H { invoke :: H b a -> b }

The hyperfunctions used here basically act like a "stack" of ordinary functions.

push :: (a -> b) -> H a b -> H a b
push f q = H $ \k -> f $ invoke k q

You also need a way to put two hyperfunctions together, end to end.

(.#.) :: H b c -> H a b -> H a c
f .#. g = H $ \k -> invoke f $ g .#. k

This is related to push by the law:

(push f x) .#. (push g y) = push (f . g) (x .#. y)

This turns out to be an associative operator, and this is the identity:

self :: H a a
self = H $ \k -> invoke k self

You also need something that disregards everything else on the "stack" and returns a specific value:

base :: b -> H a b
base b = H $ const b

And finally, you need a way to get a value out of a hyperfunction:

run :: H a a -> a
run q = invoke q self

run strings all of the pushed functions together, end to end, until it hits a base or loops infinitely.

So now you can fold both lists into hyperfunctions, using functions that pass information from one to the other, and assemble the final value.

zip xs ys = run $ foldr (\x h -> push (first x) h) (base []) xs .#. foldr (\y h -> push (second y) h) (base Nothing) ys where
  first _ Nothing = []
  first x (Just (y, xys)) = (x, y):xys

  second y xys = Just (y, xys)

The reason why folding over both lists matters is because of something GHC does called list fusion, which is talked about in the GHC.Base module, but probably should be much more well-known. Being a good list producer and using build with foldr can prevent lots of useless production and immediate consumption of list elements, and can expose further optimizations.

Dove answered 1/3, 2017 at 21:47 Comment(0)
B
2

I tried to understand this elegant solution myself, so I tried to derive the types and evaluation myself. So, we need to write a function:

zip xs ys = foldr step done xs ys

Here we need to derive step and done, whatever they are. Recall foldr's type, instantiated to lists:

foldr :: (a -> state -> state) -> state -> [a] -> state

However our foldr invocation must be instantiated to something like below, because we must accept not one, but two list arguments:

foldr :: (a -> ? -> ?) -> ? -> [a] -> [b] -> [(a,b)]

Because -> is right-associative, this is equivalent to:

foldr :: (a -> ? -> ?) -> ? -> [a] -> ([b] -> [(a,b)])

Our ([b] -> [(a,b)]) corresponds to state type variable in the original foldr type signature, therefore we must replace every occurrence of state with it:

foldr :: (a -> ([b] -> [(a,b)]) -> ([b] -> [(a,b)]))
      -> ([b] -> [(a,b)])
      -> [a]
      -> ([b] -> [(a,b)])

This means that arguments that we pass to foldr must have the following types:

step :: a -> ([b] -> [(a,b)]) -> [b] -> [(a,b)]
done :: [b] -> [(a,b)]
xs :: [a]
ys :: [b]

Recall that foldr (+) 0 [1,2,3] expands to:

1 + (2 + (3 + 0))

Therefore if xs = [1,2,3] and ys = [4,5,6,7], our foldr invocation would expand to:

1 `step` (2 `step` (3 `step` done)) $ [4,5,6,7]

This means that our 1 `step` (2 `step` (3 `step` done)) construct must create a recursive function that would go through [4,5,6,7] and zip up the elements. (Keep in mind, that if one of the original lists is longer, the excess values are thrown away). IOW, our construct must have the type [b] -> [(a,b)].

3 `step` done is our base case, where done is an initial value, like 0 in foldr (+) 0 [1..3]. We don't want to zip anything after 3, because 3 is the final value of xs, so we must terminate the recursion. How do you terminate the recursion over list in the base case? You return empty list []. But recall done type signature:

done :: [b] -> [(a,b)]

Therefore we can't return just [], we must return a function that would ignore whatever it receives. Therefore use const:

done = const [] -- this is equivalent to done = \_ -> []

Now let's start figuring out what step should be. It combines a value of type a with a function of type [b] -> [(a,b)] and returns a function of type [b] -> [(a,b)].

In 3 `step` done, we know that the result value that would later go to our zipped list must be (3,6) (knowing from original xs and ys). Therefore 3 `step` done must evaluate into:

\(y:ys) -> (3,y) : done ys

Remember, we must return a function, inside which we somehow zip up the elements, the above code is what makes sense and typechecks.

Now that we assumed how exactly step should evaluate, let's continue the evaluation. Here's how all reduction steps in our foldr evaluation look like:

3 `step` done -- becomes
(\(y:ys) -> (3,y) : done ys)
2 `step` (\(y:ys) -> (3,y) : done ys) -- becomes
(\(y:ys) -> (2,y) : (\(y:ys) -> (3,y) : done ys) ys)
1 `step` (\(y:ys) -> (2,y) : (\(y:ys) -> (3,y) : done ys) ys) -- becomes
(\(y:ys) -> (1,y) : (\(y:ys) -> (2,y) : (\(y:ys) -> (3,y) : done ys) ys) ys)

The evaluation gives rise to this implementation of step (note that we account for ys running out of elements early by returning an empty list):

step x f = \[] -> []
step x f = \(y:ys) -> (x,y) : f ys

Thus, the full function zip is implemented as follows:

zip :: [a] -> [b] -> [(a,b)]
zip xs ys = foldr step done xs ys
  where done = const []
        step x f [] = []
        step x f (y:ys) = (x,y) : f ys

P.S.: If you are inspired by elegance of folds, read Writing foldl using foldr and then Graham Hutton's A tutorial on the universality and expressiveness of fold.

Beefwood answered 12/1, 2016 at 12:44 Comment(1)
very nice, and going from types is nice as is using a generic example data and then generalizing, although I still prefer going from top to bottom because that's what foldr is actually doing. :)Townswoman
S
0

A simple approach:

lZip, rZip :: Foldable t => [b] -> t a -> [(a, b)]

-- implement zip using fold?
lZip xs ys = reverse.fst $ foldl f ([],xs) ys
     where f  (zs, (y:ys)) x = ((x,y):zs, ys)

-- Or;
rZip xs ys = fst $ foldr f ([],reverse xs) ys
     where f x (zs, (y:ys))  = ((x,y):zs, ys)
Samson answered 30/11, 2017 at 4:38 Comment(1)
Note that rZip won't work properly if xs and ys are of different length.Glabrescent

© 2022 - 2025 — McMap. All rights reserved.