Scalaz Type Classes for Apache Spark RDDs

The goal is to implement different type classes (like Semigroup, Monad, Functor, etc.) provided by Scalaz for Spark's RDD (distributed collection). Unfortunately, I cannot make any of the type classes that take higher kinded types (like Monad, Functor, etc.) to work well with RDDs.

RDDs are defined (simplified) as:

abstract class RDD[T: ClassTag](){
   def map[U: ClassTag](f: T => U): RDD[U] = {...}
}

Complete code for RDDs can be found here.

Here is one example that works fine:

import scalaz._, Scalaz._
import org.apache.spark.rdd.RDD

implicit def semigroupRDD[A] = new Semigroup[RDD[A]] {
   def append(x:RDD[A], y: => RDD[A]) = x.union(y)
}

Here is one example that doesn't work:

implicit def functorRDD =  new Functor[RDD] {
   override def map[A, B](fa: RDD[A])(f: A => B): RDD[B] = {
      fa.map(f)
   }
}

This fails with:

error: No ClassTag available for B fa.map(f)

The error is pretty clear. The map implemented in RDD expects a ClassTag (see above). The ScalaZ functor/monads etc., do not have a ClassTag. Is it even possible to make this work without modifying Scalaz and/or Spark?

Recommended topics

Hot tags