Why are so few things @specialized in Scala's standard library?

Asked 29/3, 2011 at 19:14 Answered 8/3, 2013 at 17:57

Solved performance scala collections autoboxing specialized-annotation

I've searched for the use of @specialized in the source code of the standard library of Scala 2.8.1. It looks like only a handful of traits and classes use this annotation: Function0, Function1, Function2, Tuple1, Tuple2, Product1, Product2, AbstractFunction0, AbstractFunction1, AbstractFunction2.

None of the collection classes are @specialized. Why not? Would this generate too many classes?

This means that using collection classes with primitive types is very inefficient, because there will be a lot of unnecessary boxing and unboxing going on.

What's the most efficient way to have an immutable list or sequence (with IndexedSeq characteristics) of Ints, avoiding boxing and unboxing?

Stillage answered 29/3, 2011 at 19:14 Comment(2)

Wow, cheez, i assumed that List and IndexedSeq are specialized. Doing some :javap -c with the new scala 2.9 REPL shows they still do boxing all the time. The constructors are optimized though it seems (List(1,2,3) uses a wrapIntArray). – Johannejohannes 29/3, 2011 at 23:48

That's really unwanted overhead when dealing with huge datasets of numbers. – Kennedy 31/3, 2011 at 7:20

Specialization has a high cost on the size of classes, so it must be added with careful consideration. In the particular case of collections, I imagine the impact will be huge.

Still, it is an on-going effort -- Scala library has barely started to be specialized.

Mun answered 29/3, 2011 at 21:39 Comment(4)

This is correct. Video from Scala Days 2010 on @Specialized, where Iulian Dragos explains why it's not everywhere: days2010.scala-lang.org/node/138/151 – Trappist 30/3, 2011 at 7:53

I've also grepped through the source of the Scala 2.9 RC1 library and unfortunately it looks like nothing has changed yet! – Stillage 30/3, 2011 at 17:29

@Daniel Specialization has a high cost on the size of classes, so it must be added with careful consideration As a naive, non-compiler engineer, why does the size of classes matter? I am not making an argument, but looking to learn why. – Waggery 21/4, 2015 at 15:46

@KevinMeredith Well, for one thing, up to Java 7, it was limited to 64 KB. It can also present problems when coding for Android. – Mun 21/4, 2015 at 19:46

Specialized can be expensive ( exponential ) in both size of classes and compile time. Its not just the size like the accepted answer says.

Open your scala REPL and type this.

import scala.{specialized => sp}
trait S1[@sp A, @sp B, @sp C, @sp D] { def f(p1:A): Unit }

Sorry :-). Its like a compiler bomb.

Now, lets take a simple trait

trait Foo[Int]{ }

The above will result in two compiled classes. Foo, the pure interface and Foo$1, the class implementation.

Now,

trait Foo[@specialized A] { }

A specialized template parameter here gets expanded/rewritten for 9 different primitive types ( void, boolean, byte, char, int, long, short, double, float ). So, basically you end up with 20 classes instead of 2.

Going back to the trait with 5 specialized template parameters, the classes get generated for every combination of possible primitive types. i.e its exponential in complexity.

2 * 10 ^ ^{(no of specialized parameters)}

If you are defining a class for a specific primitive type, you should be more explicit about it such as

trait Foo[@specialized(Int) A, @specialized(Int,Double) B] { }

Understandably one has to be frugal using specialized when building general purpose libraries.

Here is Paul Phillips ranting about it.

Oder answered 8/3, 2013 at 17:57 Comment(2)

But you don't always need to specialze for all eight primitive types plus void; you can also just specialize for Int, for example, using @specialized(Int). – Stillage 8/3, 2013 at 19:21

yes, thats right. Added more notes to my answer. But, thats the case when you are doing something specific to a data type. Building general purpose libraries like the the scala language api needs to be more general purpose. – Oder 8/3, 2013 at 20:54

Partial answer to my own question: I can wrap an array in an IndexedSeq like this:

import scala.collection.immutable.IndexedSeq

def arrayToIndexedSeq[@specialized(Int) T](array: Array[T]): IndexedSeq[T] = new IndexedSeq[T] {
  def apply(idx: Int): T = array(idx)
  def length: Int = array.length
}

(Ofcourse you could still modify the contents if you have access to the underlying array, but I would make sure that the array isn't passed to other parts of my program).

Stillage answered 29/3, 2011 at 19:19 Comment(5)

Since you are hiding behind IndexedSeq that still doesn't convince the compiler not to use autoboxing. For instance, if i run the :javap -c on the following: object Test { val a = arrayToIndexedSeq( Array(1,2,3)); val i: Int = a(0)} -- you will see that val i = a(0) does boxing (in SeqLike).... – Johannejohannes 29/3, 2011 at 23:53

@Sciss Ok... grrr! Is there another way to get rid of the unnecessary autoboxing? Is directly using mutable arrays the only way to avoid it? – Stillage 30/3, 2011 at 6:38

I don't know. I guess we need to make a petition to add more specializations to the collections. Martin Odersky kind of promised that here: scala-lang.org/node/7285 ; the problem i guess is due to the heavy factoring out of traits in the new collections that if you want Vector to be specialized, you will end up having to specialize a dozen involved traits, which in turn creates high bloat on the size of the library. Still, I think we should vote for that, it definitely sucks to have weak performance for Vector which is really the number crunching swiss army knife in scala. – Johannejohannes 30/3, 2011 at 13:29

@Sciss I've carefully looked at the output of scalac -print ... and following the sequence of methods being called, it looks like it actually does avoid boxing when I call apply() on it. (Using scala 2.8.1). There are some calls to scala.Int.box() but those are not called for that code path. – Stillage 31/3, 2011 at 21:34

Your collection however does have one advantage: it does not box stored elements, and that's a huge memory saving if the collection is (moderately) large. An integer is 4 bytes, boxing will probably add at least other 16 bytes to that (for object headers and the pointers to objects). When you extract one int, only that is boxed. – Cherri 19/7, 2011 at 10:21

Recommended topics

Hot tags