What is the performance impact of using the type class pattern in Scala
Asked Answered
D

1

13

I'm currently making extensive use of the type class pattern in to be performance-relevant portions of my code. I made out at least two potential sources of inefficiency.

  1. The implicit parameters get passed along message calls. I don't know whether this really happens. Maybe scalac can simply insert the implicit parameters where they are used and remove them from the method signature. This is probably not possible in cases where you insert the implicit parameters manually, since they might be resolved at runtime only. What optimizations do apply concerning passing implicit parameters?

  2. If the type class instance is provided by a def (contrary to a val), the object has to be recreated on every invocation of a "type classed method". This issue may be adressed by the JVM, which might optimize object creation away. This issue might also be adressed by scalac by reusing these objects. What optimizations do apply concerning the creation of implicit parameter objects?

And of course there might be additional sources of inefficiency when applying the type class pattern. Please tell me about them.

Donner answered 17/2, 2012 at 11:31 Comment(0)
D
8

If you genuinely care about writing ultra-high-performance code (and you may think you do but be very wrong about this) then typeclasses are going to cause some pain for the following reasons:

  • Many extra virtual method calls
  • Likely boxing of primitives (e.g. if using scalaz's typeclasses for monoids etc)
  • Object creations via def which are necessary because functions cannot be parameterized
  • Object creations to access the "pimped" methods

At runtime, the JVM may optimize some of the erroneous creations away (e.g. the creation of an MA simply to call <*>), but scalac does not do much to help. You can see this trivially by compiling some code which uses typeclasses and using -Xprint:icode as an argument.

Here's an example:

import scalaz._; import Scalaz._
object TC {
  def main(args: Array[String]) {
    println((args(0).parseInt.liftFailNel |@| args(1).parseInt.liftFailNel)(_ |+| _))
  }
}

And here's the icode:

final object TC extends java.lang.Object with ScalaObject {
  def main(args: Array[java.lang.String]): Unit = scala.this.Predef.println(scalaz.this.Scalaz.ValidationMA(scalaz.this.Scalaz.StringTo(args.apply(0)).parseInt().liftFailNel()).|@|(scalaz.this.Scalaz.StringTo(args.apply(1)).parseInt().liftFailNel()).apply({
  (new anonymous class TC$$anonfun$main$1(): Function2)
}, scalaz.this.Functor.ValidationFunctor(), scalaz.this.Apply.ValidationApply(scalaz.this.Semigroup.NonEmptyListSemigroup())));
def this(): object TC = {
  TC.super.this();
  ()
}
};
@SerialVersionUID(0) final <synthetic> class TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$2 extends scala.runtime.AbstractFunction0 with Serializable {
  final def apply(): Int = TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$2.this.v1$1;
  final <bridge> def apply(): java.lang.Object = scala.Int.box(TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$2.this.apply());
  <synthetic> <paramaccessor> private[this] val v1$1: Int = _;
  def this($outer: anonymous class TC$$anonfun$main$1, v1$1: Int): anonymous class TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$2 = {
    TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$2.this.v1$1 = v1$1;
    TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$2.super.this();
    ()
  }
};
@SerialVersionUID(0) final <synthetic> class TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$1 extends scala.runtime.AbstractFunction0$mcI$sp with Serializable {
  final def apply(): Int = TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$1.this.apply$mcI$sp();
  <specialized> def apply$mcI$sp(): Int = TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$1.this.v2$1;
  final <bridge> def apply(): java.lang.Object = scala.Int.box(TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$1.this.apply());
  <synthetic> <paramaccessor> private[this] val v2$1: Int = _;
  def this($outer: anonymous class TC$$anonfun$main$1, v2$1: Int): anonymous class TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$1 = {
    TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$1.this.v2$1 = v2$1;
   TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$1.super.this();
  ()
  }
};
@SerialVersionUID(0) final <synthetic> class TC$$anonfun$main$1 extends scala.runtime.AbstractFunction2$mcIII$sp with Serializable {
  final def apply(x$1: Int, x$2: Int): Int = TC$$anonfun$main$1.this.apply$mcIII$sp(x$1, x$2);
  <specialized> def apply$mcIII$sp(v1$1: Int, v2$1: Int): Int = scala.Int.unbox(scalaz.this.Scalaz.mkIdentity({
(new anonymous class TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$2(TC$$anonfun$main$1.this, v1$1): Function0)
}).|+|({
    (new anonymous class TC$$anonfun$main$1$$anonfun$apply$mcIII$sp$1(TC$$anonfun$main$1.this, v2$1): Function0)
}, scalaz.this.Semigroup.IntSemigroup()));
final <bridge> def apply(v1: java.lang.Object, v2: java.lang.Object): java.lang.Object = scala.Int.box(TC$$anonfun$main$1.this.apply(scala.Int.unbox(v1), scala.Int.unbox(v2)));
  def this(): anonymous class TC$$anonfun$main$1 = {
    TC$$anonfun$main$1.super.this();
    ()
   }
 }

}

You can see there's a huge amount of object creation going on here

Drugge answered 17/2, 2012 at 12:14 Comment(6)
So one suggestion I read from your answer might be to replace scalaz.Monoid, which I am indeed using, by an own version with specialization? Though specialization seems to be very buggy... Even Numeric isn't specialized it seems.Donner
I would steer well clear of specialization tbh. If I was in your shoes I would want to be very, very sure that I did need to squeeze every last drop of performance out of the code. What makes you sure that you do? If thing have to be close to the metal, then I would say that you'll have to go back to mutable collections, imperative code and while loops. There really is no other answerDrugge
It doesn't REALLY have to be as fast as possible. But I don't want to take the performance impact of boxing. I'm currently using primitive arrays (without mutating them). Concerning the problem, it is possible to keep the code very generic to apply to a large problem space (I'm even using algebraic rings as an abstraction). Currently I don't want to decide between performance (like boxing taking a 10x hit) and abstraction; its not an easy choice and I wonder how far I could go with taking both.Donner
And as a comment: Do you really consider getting rid of boxing as "squeezing the last drop of performance out of the code"? Boxing is a rather huge drop.Donner
The honest truth is that I've not found boxing to be that big a deal for the workloads I haveDrugge
The generic answer to this sort of thing is to write it the maintainable, intuitive, flexible way first, and then profile your program to see what the real pain points are. Then start optimizing. Is there a bundle of extra object creation going on here? Yes. Is that going to be slower than C? Yes. But it's also way more maintainable, intuitive, and flexible than raw C code.Cytochemistry

© 2022 - 2024 — McMap. All rights reserved.