Do typed arrays help the JIT to optimize better?
Asked Answered
D

2

7

My question is the following:

Its usual for Java code to have generic collections implemented like:

public class GenericCollection<T> {
    private Object[] data;

    public GenericCollection () {
        // Backing array is a plain object array.
        this.data = new Object[10];
    }

    @SuppressWarnings( "unchecked" )
    public T get(int index) {
        // And we just cast to appropriate type when needed.
        return (T) this.data[index];
    }
}

And used like this for example:

for (MyObject obj : genericCollection) {
    obj.myObjectMethod();
}

Since the generic type of genericCollection is erased, the JVM doesn't seems to have a way to know that really inside 'data' array of genericCollection there are only MyObject instances, since the actual type of the array is Object, there could be a String in it, and calling 'myObjectMethod' on it would raise an exception.

So I'm assuming the JVM has to do some runtime checking gymnastics to know what really is inside that GenericCollection instance.

Now look at this implementation:

public class GenericCollection<T> {
    private T[] data;

    @SuppressWarnings( "unchecked" )
    public GenericCollection ( Class<T> type ) {
        // Create a type specific array.
        this.data = (T[]) Array.newInstance( type, 10 );
    }

    public T get ( int index ) {
        // No unsafe casts needed.
        return this.data[index];
    }
}

In this case we create a type specific array through reflection, so the JVM could infer there could be only be T objects inside that array in a given context, making the unsafe casts and possible expensive type checks redundant.

My question would be, given the things HotSpot can do, would it help in any way, performance-wise, to implement generic collections with a "proper" type specific backing array?

For example, does it helps HotSpot in removing unnecessary type checks or casts? Maybe possibly enabling it to more easily inline methods given it knows the backing array is of a specific type?

Desultory answered 27/3, 2016 at 23:25 Comment(1)
hotspot JIT is mostly based on types observed by profiling and not on java-level type information. So creating arrays of many different types might actually create polymorphic code in non-inlined code pathsHeder
C
6

Not in this particular case.

Generic array T[] is erased to Object[] in the bytecode. The array getter for Object[] always returns Object, so it does not need check the actual type of array. Hence there is no benefit in having T[] instead of Object[] for array get operation. In both cases there is aaload instruction followed by checkcast, and it works the same way.

Meanwhile array setter will perform worse for typed array rather than Object[], because aastore must check that the value matches the actual array component type.

That is, your proposed modification works equally for get, but performs worse for set. This can be confirmed by the following JMH benchmark.

package bench;

import org.openjdk.jmh.annotations.*;

import java.lang.reflect.Array;

@State(Scope.Benchmark)
public class Generics {
    private ObjectArray<String> objectArray;
    private GenericArray<String> genericArray;
    private StringArray stringArray;
    private int index;

    @Param("100000")
    private int length;

    @Setup
    public void setup() {
        genericArray = new GenericArray<>(String.class, length);
        objectArray = new ObjectArray<>(length);
        stringArray = new StringArray(length);

        for (int i = 0; i < length; i++) {
            String s = Integer.toString(i);
            objectArray.set(i, s);
            genericArray.set(i, s);
            stringArray.set(i, s);
        }
    }

    @Benchmark
    public String getGenericArray() {
        return genericArray.get(nextIndex());
    }

    @Benchmark
    public String getObjectArray() {
        return objectArray.get(nextIndex());
    }

    @Benchmark
    public String getStringArray() {
        return stringArray.get(nextIndex());
    }

    @Benchmark
    public void setGenericArray() {
        genericArray.set(nextIndex(), "value");
    }

    @Benchmark
    public void setObjectArray() {
        objectArray.set(nextIndex(), "value");
    }

    @Benchmark
    public void setStringArray() {
        stringArray.set(nextIndex(), "value");
    }

    private int nextIndex() {
        if (++index == length) index = 0;
        return index;
    }

    static class GenericArray<T> {
        private T[] data;

        @SuppressWarnings("unchecked")
        public GenericArray(Class<T> type, int length) {
            this.data = (T[]) Array.newInstance(type, length);
        }

        public T get(int index) {
            return data[index];
        }

        public void set(int index, T value) {
            data[index] = value;
        }
    }

    static class ObjectArray<T> {
        private Object[] data;

        public ObjectArray(int length) {
            this.data = new Object[length];
        }

        @SuppressWarnings("unchecked")
        public T get(int index) {
            return (T) data[index];
        }

        public void set(int index, T value) {
            data[index] = value;
        }
    }

    static class StringArray {
        private String[] data;

        public StringArray(int length) {
            this.data = new String[length];
        }

        public String get(int index) {
            return data[index];
        }

        public void set(int index, String value) {
            data[index] = value;
        }
    }
}

And the results:

Benchmark                 (length)  Mode  Cnt  Score   Error  Units
Generics.getGenericArray    100000  avgt   40  5,212 ± 0,038  ns/op  <- equal
Generics.getObjectArray     100000  avgt   40  5,224 ± 0,043  ns/op  <-
Generics.getStringArray     100000  avgt   40  4,557 ± 0,051  ns/op
Generics.setGenericArray    100000  avgt   40  3,299 ± 0,032  ns/op  <- worse
Generics.setObjectArray     100000  avgt   40  2,456 ± 0,007  ns/op  <-
Generics.setStringArray     100000  avgt   40  2,138 ± 0,008  ns/op
Corley answered 28/3, 2016 at 10:36 Comment(5)
Great analysis. It's interesting why get is always slower than set even when no checkcast is necessary. Blackhole-related glitch?Bender
@TagirValeev Exactly. Each get in this benchmark is implicitly followed by Blackhole.consume (which is not free, of course), while set is not.Corley
Sorry for the late reply. Great answer although it doesn't addresses one point of my question. "set" is slower because a type check. "get" methods are the same. The thing is, with a "normal" generic, wouldn't you be paying the type check cost when using the returned object of the collection? "get" alone wont incur a type check because the object isn't being used, but I'm guessing that in the case of my example (obj.myObjectMethod(); while iterating), the type check would have to be made in the "normal" generic case on each access, but would be avoided on the typed array case.Desultory
Thus if it were the case, with the typed array you'd pay a price during initialization, but iteration times would be faster.Desultory
@Desultory OK, I see. You're suggesting an optimization of hoisting type check out of the loop. Well, while being theoretically possible, this optimization is not obvious, and HotSpot does not do so.Corley
M
2

No. The type erasure Java Tutorial explains

Generics were introduced to the Java language to provide tighter type checks at compile time and to support generic programming. To implement generics, the Java compiler applies type erasure to:

  • Replace all type parameters in generic types with their bounds or Object if the type parameters are unbounded. The produced bytecode, therefore, contains only ordinary classes, interfaces, and methods.
  • Insert type casts if necessary to preserve type safety.
  • Generate bridge methods to preserve polymorphism in extended generic types.

Thus, after compilation, the generic types are Object.

Maggi answered 27/3, 2016 at 23:26 Comment(4)
Generics are a compile time type safety mechanism; so correct, at runtime it's treated as an Object[] anyway.Maggi
Yeah, generic type its object. But as it says there, the compiler inserts type casts where is necessary (ie, retrieving an object from the collection). Now, what I asked is what happens at runtime. The array gets created with the proper type via Array.newInstance (you can try it out), so what I am asking is if HotSpot can do something extra with that kind of type information, like erasing unnecessary casts/type checks. The spec only mentions type erasure at compile time, but it doesnt specifies what kind of things HotSpot can do with extra type information at runtime.Desultory
Generics don't help, though using a primitive array instead of an Object array can help.Magma
@Desultory the extra cast is more likely to be slightly slower than speed things up.Magma

© 2022 - 2024 — McMap. All rights reserved.