How efficient is the writer monad for lists?

The implementation of the Haskell writer monad on lists (Writer [w] a) will use ++ to add items. So if I write this code in a list writer monad:

do
  tell [a, b, c]
  tell [d]

The lists will be appended with [a, b, c] ++ [d]. After working in OCaml, I’ve internalized that lists should be built with a cons operator (:) instead of a concatenation operator (++) as the latter is O(n) in its first argument.

My workload adds one “message” to the writer monad at a time so the second argument to ++ will usually be a singleton list.

In Haskell will laziness make a list writer monad more efficient than in an eager language like OCaml? If not what would an efficient alternative be for my workload?

Left-associated (++)s are inefficient, because the left-most lists are traversed multiple times, once for each enclosing (++). Right-associated (++) are fine (at least, they can't be made more efficient by using (:) directly).

The standard WriterT transformer (and (,) writer) associate their calls to (++) in the same way their bindings are associated. So by extension of the previous discussion, left-associated (>>=)s will be problematic, while right-associated ones are fine. In particular, this means there is an abstraction cost. If, in a refactoring, one were to pull out the first two lines of the do block below:

x = do
    tell a
    tell b
    tell c

into a separate definition, perhaps because they happen often:

y = do
    tell a
    tell b

x = do
    y
    tell c

This refactoring re-associates one binding to the left, and so costs slightly more.

In case this worries you, you can choose a slightly different tradeoff by using the standard difference-list trick as your monoid. So:

do
    tell (Endo ([a,b,c]++))
    tell (Endo ([d]++))

This will magically re-associate your (++)s to the right (wow! blows my mind every time I re-figure out how that works). The cost is that each observation of the difference list (that is, conversion from difference list to standard list) is expensive (whereas with the previous choice of bare lists, multiple observations cost no more than one observation). If you have just one consumer -- say, a top-level call to runWriterT that flattens the list once and for all -- this is no problem asymptotically, but if you find yourself calling listen or pass and inspecting the difference list often, you may not want to choose this.

If neither of these tradeoffs sounds good to you, a third choice would be to use finger-trees, e.g. Seq, for which observation is free (unlike difference lists) and concatenation on either end is log-time in the shorter argument (unlike standard lists, where it is linear in the first argument), but for which the constants are enough higher that you can notice it in many cases.

Recommended topics

Hot tags