The stream transducers from the book Function Programming in Scala (FPiS) and Clojure's transducers are quite similar. They are a generalisation of the idea of having a "machine" (step function) to process the input stream into the output stream. FPiS's transducers are called Process
es. Rich Hickey also uses the term process in his introductory talk on transducers in Clojure.
Origins
The design of FPiS's transducers is based on Mealy machines. Mealy machines are said to have:
transition function T : (S, I) -> S
output function G : (S, I) -> O
These functions can be fused together to form:
step: (S, I) -> (S, O)
Heres it's easy to see that the step function operates on the current state of the machine and the next input item to produce the next state of the machine and output item.
One of the combinators from FPiS uses such a step function:
trait Process[I, O] {
...
def loop[S, I, O](z: S)(f: (I,S) => (O,S)): Process[I, O]
...
}
This loop
function is essentially the seeded left reduction that Rickey talks about in this slide.
Context agnostic
Both can be used in many different context (such as lists, streams, channels, etc.).
In FPiS transducers, a process type is:
trait Process[I, O]
All it knows about are it's input elements and it's output elements.
In Clojure, it's a similar story. Hickey calls this "fully decoupled".
Composition
Both types of transducers can be composed.
FPiS uses a "pipe" operator
map(labelHeavy) |> filter(_.nonFood)
Clojure uses comp
(comp
(filtering non-food?)
(mapping label-heavy))
Representation
In Clojure:
reducer: (whatever, input) -> whatever
transducer: reducer -> reducer
In FPiS:
// The main type is
trait Process[I, O]
// Many combinators have the type
Process[I, O] ⇒ Process[I, O]
However, FPiS's representation isn't just a function under the hood. It's a case-class (algebraic data type) with 3 variants: Await, Emit, and Halt.
case class Await[I,O](recv: Option[I] => Process[I,O])
case class Emit[I,O](head: O, tail: Process[I,O]
case class Halt[I,O]() extends Process[I,O]
- Await plays the part of the reducer->reducer function from Clojure.
- Halt plays the part of
reduced
in Clojure.
- Emit stands in stead of calling the next step function in Clojure.
Early termination
Both support early termination. Clojure does it using a special value called reduced
which can be tested for via the reduced?
predicate.
FPiS uses a more statically typed approach, a Process can be in one of 3 states: Await, Emit or Halt. When a "step function" returns a process of state Halt, the processing function knows to stop.
Efficiency
In some points they are again similar. Both types of transducers are demand-driven and don't generate intermediate collections. However, I'd imagine that FPiS's transducers are not as efficient when pipelined/composed as the internal representation is more than "just a stack of function calls" as Hickey puts it. I'm only guessing here about the efficiency/performance though.
Look into fs2
(previously scalaz-stream
) for a perhaps more performant library that is based on the design of the transducers in FPiS.
Example
Here's an example of filter
in both implementations:
Clojure, from Hickey's talk slides:
(defn filter
([pred]
(fn [rf]
(fn
([] (rf))
([result] (rf result))
([result input]
(if (prod input)
(rf result input)
result)))))
([pred coll]
(sequence (filter red) coll)))
In FPiS, here's one way to implement it:
def filter[I](f: I ⇒ Boolean): Process[I, I] =
await(i ⇒ if (f(i)) emit(i, filter(f))
else filter(f))
As you can see, filter
is built up here from other combinators such as await
and emit
.
Safety
There are a number of places where you have to be careful when implementing Clojure transducers. This seems to be a design trade-off favouring efficiency. However, this downside would seem to effect mostly library producers rather than end-users/consumers.
- If a transducer gets a
reduced
value from a nested step call, it must never call that step function again with input.
- Transducers that require state must create unique state and may not be aliased.
- All step functions must have an arity-1 variant that does not take an input.
- A transducer's completion operation must call its nested completion operation, exactly once, and return what it returns.
The transducer design from FPiS favours correctness and ease of use. The pipe composition and flatMap
operations ensure that completion actions occur promptly and that errors are handled appropriately. These concerns are not a burden to implementors of transducers. That said, I imagine that the library may not be as efficient as the Clojure one.
Summary
Both Clojure and FPiS transducers have:
- similar origins
- the ability to be used in different contexts (list, streams, channels, file/network io, database results)
- demand-driven / early termination
- finalisation/completion (for resource safety)
- tasty :)
They differ somewhat in their underlying representation. Clojure style transducers seem to favour efficiency whereas FPiS transducers favour correctness and compositionality.