Scalaz Type Classes for Apache Spark RDDs
Asked Answered
T

1

7

The goal is to implement different type classes (like Semigroup, Monad, Functor, etc.) provided by Scalaz for Spark's RDD (distributed collection). Unfortunately, I cannot make any of the type classes that take higher kinded types (like Monad, Functor, etc.) to work well with RDDs.

RDDs are defined (simplified) as:

abstract class RDD[T: ClassTag](){
   def map[U: ClassTag](f: T => U): RDD[U] = {...}
}

Complete code for RDDs can be found here.

Here is one example that works fine:

import scalaz._, Scalaz._
import org.apache.spark.rdd.RDD

implicit def semigroupRDD[A] = new Semigroup[RDD[A]] {
   def append(x:RDD[A], y: => RDD[A]) = x.union(y)
}

Here is one example that doesn't work:

implicit def functorRDD =  new Functor[RDD] {
   override def map[A, B](fa: RDD[A])(f: A => B): RDD[B] = {
      fa.map(f)
   }
}

This fails with:

error: No ClassTag available for B fa.map(f)

The error is pretty clear. The map implemented in RDD expects a ClassTag (see above). The ScalaZ functor/monads etc., do not have a ClassTag. Is it even possible to make this work without modifying Scalaz and/or Spark?

Thoth answered 17/4, 2016 at 4:15 Comment(0)
P
11

Short answer: no

For type classes like Functor, the restriction is that for any A and B, unconstrained, given A => B you have a function lifted RDD[A] => RDD[B]. In Spark you cannot pick arbitrary A and B, since you need a ClassTag for B, as you saw.

For other type classes like Semigroup where the type doesn't change during the operation and therefore does not need a ClassTag, it works.

Penumbra answered 17/4, 2016 at 4:56 Comment(1)
This was my conclusion as well.Thoth

© 2022 - 2024 — McMap. All rights reserved.