(++)
's notoriously bad asymptotics come about when you use it in a left-associative manner - that is, when (++)
's left argument is the result of another call to (++)
. Right-associative expressions run efficiently, though.
To put it more concretely: evaluating a left-nested append of m
lists like
((ws ++ xs) ++ ys) ++ zs -- m = 3 in this example
to WHNF requires you to force m
thunks, because (++)
is strict in its left argument.
case (
case (
case ws of { [] -> xs ; (w:ws) -> w:(ws ++ xs) }
) of { [] -> ys ; (x:xs) -> x:(xs ++ ys) }
) of { [] -> zs ; (y:ys) -> y:(ys ++ zs) }
In general, to fully evaluate n
elements of such a list, this'll require forcing that stack of m
thunks n
times, for O(m*n) complexity. When the entire list is built from appends of singleton lists (ie (([w] ++ [x]) ++ [y]) ++ [z]
), m = n
so the cost is O(n2).
Evaluating a right-nested append like
ws ++ (xs ++ (ys ++ zs))
to WHNF is much easier (O(1)):
case ws of
[] -> xs ++ (ys ++ zs)
(w:ws) -> w:(ws ++ (xs ++ (ys ++ zs)))
Evaluating n
elements just requires evaluating n
thunks, which is about as good as you can expect to do.
Difference lists work by paying a small (O(m)) up-front cost to automatically re-associate calls to (++)
.
newtype DList a = DList ([a] -> [a])
fromList xs = DList (xs ++)
toList (DList f) = f []
instance Monoid (DList a) where
mempty = fromList []
DList f `mappend` DList g = DList (f . g)
Now a left-nested expression,
toList (((fromList ws <> fromList xs) <> fromList ys) <> fromList zs)
gets evaluated in a right-associative manner:
((((ws ++) . (xs ++)) . (ys ++)) . (zs ++)) []
-- expand innermost (.)
(((\l0 -> ws ++ (xs ++ l0)) . (ys ++)) . (zs ++)) []
-- expand innermost (.)
((\l1 -> (\l0 -> ws ++ (xs ++ l0)) (ys ++ l1)) . (zs ++)) []
-- beta reduce
((\l1 -> ws ++ (xs ++ (ys ++ l1))) . (zs ++)) []
-- expand innermost (.)
(\l2 -> (\l1 -> ws ++ (xs ++ (ys ++ l1))) (zs ++ l2)) []
-- beta reduce
(\l2 -> ws ++ (xs ++ (ys ++ (zs ++ l2)))) []
-- beta reduce
ws ++ (xs ++ (ys ++ (zs ++ [])))
You perform O(m) steps to evaluate the composed function, then O(n) steps to evaluate the resulting expression, for a total complexity of O(m+n), which is asymptotically better than O(m*n). When the list is made up entirely of appends of singleton lists, m = n
and you get O(2n) ~ O(n), which is asymptotically better than O(n2).
This trick works for any Monoid
.
newtype RMonoid m = RMonoid (m -> m) -- "right-associative monoid"
toRM m = RMonoid (m <>)
fromRM (RMonoid f) = f mempty
instance Monoid m => Monoid (RMonoid m):
mempty = toRM mempty
RMonoid f `mappend` RMonoid g = RMonoid (f . g)
See also, for example, the Codensity monad, which applies this idea to monadic expressions built using (>>=)
(rather than monoidal expressions built using (<>)
).
I hope I have convinced you that (++)
only causes a problem when used in a left-associative fashion. Now, you can easily write a list-like data structure for which append is strict in its right argument, and left-associative appends are therefore not a problem.
data Snoc a = Nil | Snoc (Snoc a) a
xs +++ Nil = xs
xs +++ (Snoc ys y) = Snoc (xs +++ ys) y
We recover O(1) WHNF of left-nested appends,
((ws +++ xs) +++ ys) +++ zs
case zs of
Nil -> (ws +++ xs) +++ ys
Snoc zs z -> Snoc ((ws +++ xs) +++ ys) +++ zs) z
but at the expense of slow right-nested appends.
ws +++ (xs +++ (ys +++ zs))
case (
case (
case zs of { Nil -> ys ; (Snoc zs z) -> Snoc (ys +++ zs) z }
) of { Nil -> xs ; (Snoc ys y) -> Snoc (xs +++ ys) y }
) of { Nil -> ws ; (Snoc xs x) -> Snoc (ws +++ xs) y }
Then, of course, you end up writing a new type of difference list which reassociates appends to the left!
newtype LMonoid m = LMonoid (m -> m) -- "left-associative monoid"
toLM m = LMonoid (<> m)
fromLM (LMonoid f) = f mempty
instance Monoid m => Monoid (LMonoid m):
mempty = toLM mempty
LMonoid f `mappend` LMonoid g = LMonoid (g . f)
Data.Sequence
). – Armaghdata DList a = Empty | Single a | Append (DList a) (DList a)
will give the same asymoptotics forcons
,snoc
,append
, andtoList
. I don't have time to write up a full answer now, though, and I'm sure you could find a better representation if you desire. – LoveringtoList
-ing a left-associative sequence ofAppend
s will be just as slow as evaluating a left-associative sequence of(++)
s. – HivestoList
for that. You just need to never use(++)
, which means traversing back-to-front with an accumulator. – LoveringDList
:) – HivestoList
, though. So.. There is that. – LoveringtoList
without ever touching a CPS representation. You only need CPS to recover laziness. – LoveringDList
implements – HivesDList
uses a left-to-right evaluation strategy that uses an implicit continuation for the rest of the output instead of an accumulator. – Loveringdata DList a
isn't yet the realDList
, because it hasn't yet had the deforestation optimization applied. (Which is a pretty good way to understand howDList
came to be the way it is, actually: you start with Carl's implementation, then apply some optimizations.) – Quint