More efficient tail of church encoded list
Asked Answered
I

3

13

This is a literate haskell post. Simply save it as "ChurchList.lhs" to run it.

> {-# LANGUAGE Rank2Types #-}

A Church encoded list is a way of representing a list via a function. It resembles both folding and continuation passing style.

> newtype ChurchList a = CList {runCList :: forall r. (a -> r -> r) -> r -> r}

For illustration as to how this corresponds to a list, here is a O(n) isomorphism

> fromList :: [a] -> ChurchList a
> fromList xs = CList $ \cons empty -> foldr cons empty xs

> toList :: ChurchList a -> [a]
> toList cl = runCList cl (:) []

> instance Show a => Show (ChurchList a) where
>   show cl = "fromList " ++ show (toList cl)

These things have good performance charecteristics.

> singleton :: a -> ChurchList a -- O(1)
> singleton a = CList $ \cons empty -> a `cons` empty
> append :: ChurchList a -> ChurchList a -> ChurchList a -- O(1)!!! This also means cons and snoc are O(1)
> append cl1 cl2 = CList $ \cons empty -> runCList cl1 cons (runCList cl2 cons empty)
> concatCl :: ChurchList (ChurchList a) -> ChurchList a -- O(n)
> concatCl clcl = CList $ \cons empty -> runCList clcl (\cl r -> runCList cl cons r) empty
> headCl :: ChurchList a -> Maybe a -- O(1)
> headCl cl = runCList cl (\a _ -> Just a) Nothing

Now, the problem comes with tail.

> tailClbad :: ChurchList a -> Maybe (ChurchList a) --O(n)?!!
> tailClbad cl = (fmap snd) $ runCList cl
>
>    (\a r -> Just (a, case r of
>    Nothing -> CList $ \cons empty -> empty
>    Just (s,t) -> append (singleton s) t)) --Cons
>
>    Nothing --Empty

Essentially what my implementation does is split the list into head and tail. Cons replaces the head, and appends the old head unto the tail. This is rather inefficient. It seems that Church Lists are inefficient in general at splitting.

I'm hoping that I'm wrong. Is there an implementation of tailCl that is better than O(n) (preferably O(1)).

Innocency answered 29/8, 2015 at 16:40 Comment(1)
As pointed out by Oleg Kiselyov, this is actually Boehm-Berarducci encoding. The article he links in the mail is quite interesting.Bolster
B
6

Paper Church Encoding of Data Types Considered Harmful for Implementations by Koopman, Plasmeijer and Jansen seems to deal with the issue in detail. In particular, quoting the abstract (my emphasis):

[...]

We show that in the Church encoding selectors of constructors yielding the recursive type, like the tail of a list, have an undesirable strictness in the spine of the data structure. The Scott encoding does not hamper lazy evaluation in any way. The evaluation of the recursive spine by the Church encoding makes the complexity of these destructors O(n). The same destructors in the Scott encoding requires only constant time. Moreover, the Church encoding has serious problems with graph reduction. The Parigot encoding combines the best of both worlds, but in practice this does not offer an advantage.

However, while Scott encoding provides the performance advantage, it appears to be problematic to define it in System F without adding recursive types.

Bolster answered 29/8, 2015 at 19:31 Comment(9)
Also, Scott encoding doesn't appear to have constant time append, which was my main draw to Church Lists.Innocency
@PyRulez Indeed. It seems that there is a trade-off between being able to pattern match on the head and access the end. One solution would be to use a more complex data structure, like a catenable FIFO queue (or dequeue) described by Okasaki, would that be something that you could use?Bolster
@PyRulez, the Parigot encoding permits both pattern matching and O(1) append. Here is an example.Wideopen
@user3237465 The Parigot encoding isn't perfectly isomorphic, as the Church and Scott parts of it can get out of sync. Indeed, the code you linked will create inconsistent Parigot lists. (Try appending two lists, and then repeatedly take the tail of it.)Innocency
@PetrPudlák I was looking at the Church List mostly for theoretical purposes. Thanks though.Innocency
@PyRulez, that's a bug in my implementation — I forgot about the Scott part in pappend. I fixed the bug, inconsistent lists shouldn't be generated now. Good catch.Wideopen
@user3237465 You will notice that while it can simulate the performance of either a Church list, or a Scott list; it can't simulate both. If you use both the church and scott parts, everything will get an O(n) delay. Unless I am incorrect, which I hope I am.Innocency
@PyRulez, if you use the Scott part, then the Church part won't be used at all and you will lost only fusion (it seems like in the case of pappend you will lost fusion only for the first list). That's why there are two pfuse in the definition of pzipWith in my code. But I have to admit that I made no tests.Wideopen
@user3237465 I tried making a Parigot implementation, and I ran into similar issues (I didn't go very far with it though. I also just tried figuring out computation complexities in my head.)Innocency
C
3

Yes, it's O(n). A church encoded list is identified with its foldr function, which must do the same thing everywhere. Since getting the tail requires doing something for the first item, the same something must be done for all the remaining items.

{-# LANGUAGE RankNTypes #-}

newtype ChurchList a = CList { getFoldr :: forall r. (a -> r -> r) -> r -> r }

fromList :: [a] -> ChurchList a 
fromList xs = CList $ \f z -> foldr f z xs

toList :: ChurchList a -> [a]
toList cl = getFoldr cl ((:)) []

Your solution is as productive as possible. The same solution can also be written trivially by building a list and matching on the first item.

safeTail :: [a] -> Maybe [a]
safeTail []     = Nothing
safeTail (_:xs) = Just xs

tailCtrivial ::  ChurchList a -> Maybe (ChurchList a)
tailCtrivial = fmap fromList . safeTail . toList
Counterstatement answered 29/8, 2015 at 18:53 Comment(0)
O
1

No, not necessarily O(n):

Prelude> take 5 . snd . foldr (\a r-> (a:fst r,fst r)) ([], undefined) $ [1..]
[2,3,4,5,6]

It indeed adds O(1) overhead for each element ultimately produced.

Trying for the safetail didn't work:

Prelude> fmap (take 5) . snd . foldr (\a r-> (fmap (a:) $ fst r,fst r)) (Just [], Nothing)
$ [1..]
Interrupted.

So,

tailCL cl = CList $ \k z-> snd $ runCList cl (\a r-> (a`k`fst r,fst r)) (z, undefined)

Prelude> take 5 . toList . tailCL . fromList $ [1..]
[2,3,4,5,6]


edit: followng the comment by @user3237465, it turns out that safetail is possible after all:

Prelude> fmap (take 5) . snd . foldr (\a ~(r,_)-> (Just (a:fromJust r), r)) (Just [] , Nothing) $ [1..]
Just [2,3,4,5,6]

The reason it didn't work before is that Maybe's fmap forces its second argument to find out which case is it, but here we know that it is a Just value, by construction. I could't put it as a definition for your type though, whatever I tried didn't pass the type checker.

Orcinol answered 30/8, 2015 at 6:47 Comment(2)
That doesn't work however for newtype ChurchList m a = CList { runCList :: forall r. (a -> m r -> m r) -> m r -> m r }, when m is a strict monad. There is an actual way to split data structures, but only aliens can understand the code. BTW, tail' = snd . foldr (\x ~(r, _)-> (x : r, r)) ([], undefined) looks more suggestive to me.Wideopen
@user3237465 I specifically wanted to avoid the lazy pattern, for whatever reason.Orcinol

© 2022 - 2024 — McMap. All rights reserved.