I assumed that separating objects that implement different interfaces into several lists and iterating those lists afterwards would be faster than dumping all objects into a single list and then switching via instanceof
. E.g. this:
ArrayList<Visible> visibles = new ArrayList<>();
ArrayList<Highlightable> highlightables = new ArrayList<>();
ArrayList<Selectable> selectables = new ArrayList<>();
// populate the lists
// Visible is an interface, Highlightable is also interface (extends Visible),
// Selectable extends Highlightable
// All interfaces have 3 concrete subclasses each,
// to test situations when JVM is not able to optimize too much due to small number of classes
for (Visible e : visibles) {
vsum += e.visibleValue();
}
for (Highlightable e : highlightables) {
vsum += e.visibleValue();
hsum += e.highlightableValue();
}
for (Selectable e : selectables) {
vsum += e.visibleValue();
hsum += e.highlightableValue();
ssum += e.selectableValue();
}
should be faster than
ArrayList<Visible> visibles = new ArrayList<>();
// populate the list
for (Visible e : visibles) {
if (e instanceof Selectable) {
vsum += e.visibleValue();
hsum += ((Selectable) e).highlightableValue();
ssum += ((Selectable) e).selectableValue();
} else if (e instanceof Highlightable) {
vsum += e.visibleValue();
hsum += ((Highlightable) e).highlightableValue();
} else {
vsum += e.visibleValue();
}
}
However it doesn't seem to be the case:
Main.separateLists thrpt 30 1546.898 ± 32.312 ops/s
Main.singleListAndInstanceof thrpt 30 1673.733 ± 29.804 ops/s
I added full source for the benchmark below.
What could be the cause of instanceof
version being faster? Even if we assume that isntanceof
instruction is free, then both versions should at least be equal in performance (because they still add elements to list and iterate them).
package test;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Warmup;
import java.util.ArrayList;
import java.util.Random;
public class Main {
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
@Benchmark
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 15, time = 1)
@Fork(value = 2)
public static long separateLists() {
ArrayList<Visible> visibles = new ArrayList<>(3_500);
ArrayList<Highlightable> highlightables = new ArrayList<>(3_500);
ArrayList<Selectable> selectables = new ArrayList<>(3_500);
Random random = new Random();
for (int i = 0; i < 10_000; i++) {
switch (random.nextInt(9)) {
case 0:
visibles.add(new Visible1(i));
break;
case 1:
highlightables.add(new Highlightable1(i));
break;
case 2:
selectables.add(new Selectable1(i));
break;
case 3:
visibles.add(new Visible2(i));
break;
case 4:
highlightables.add(new Highlightable2(i));
break;
case 5:
selectables.add(new Selectable2(i));
break;
case 6:
visibles.add(new Visible3(i));
break;
case 7:
highlightables.add(new Highlightable3(i));
break;
case 8:
selectables.add(new Selectable3(i));
break;
}
}
long listSize = visibles.size() + highlightables.size() + selectables.size();
long vsum = 0;
long hsum = 0;
long ssum = 0;
for (Visible e : visibles) {
vsum += e.visibleValue();
}
for (Highlightable e : highlightables) {
vsum += e.visibleValue();
hsum += e.highlightableValue();
}
for (Selectable e : selectables) {
vsum += e.visibleValue();
hsum += e.highlightableValue();
ssum += e.selectableValue();
}
return listSize + vsum * hsum * ssum;
}
@Benchmark
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 15, time = 1)
@Fork(value = 2)
public static long singleListAndInstanceof() {
ArrayList<Visible> visibles = new ArrayList<>(10_000);
Random random = new Random();
for (int i = 0; i < 10_000; i++) {
switch (random.nextInt(9)) {
case 0:
visibles.add(new Visible1(i));
break;
case 1:
visibles.add(new Highlightable1(i));
break;
case 2:
visibles.add(new Selectable1(i));
break;
case 3:
visibles.add(new Visible2(i));
break;
case 4:
visibles.add(new Highlightable2(i));
break;
case 5:
visibles.add(new Selectable2(i));
break;
case 6:
visibles.add(new Visible3(i));
break;
case 7:
visibles.add(new Highlightable3(i));
break;
case 8:
visibles.add(new Selectable3(i));
break;
}
}
long listSize = visibles.size();
long vsum = 0;
long hsum = 0;
long ssum = 0;
for (Visible e : visibles) {
if (e instanceof Selectable) {
vsum += e.visibleValue();
hsum += ((Selectable) e).highlightableValue();
ssum += ((Selectable) e).selectableValue();
} else if (e instanceof Highlightable) {
vsum += e.visibleValue();
hsum += ((Highlightable) e).highlightableValue();
} else {
vsum += e.visibleValue();
}
}
return listSize + vsum * hsum * ssum;
}
}
abstract class Visible {
abstract int visibleValue();
}
abstract class Highlightable extends Visible {
abstract int highlightableValue();
}
abstract class Selectable extends Highlightable {
abstract int selectableValue();
}
class Visible1 extends Visible {
private int v;
Visible1(int v) {
this.v = v;
}
@Override int visibleValue() { return v; }
}
class Highlightable1 extends Highlightable {
private int v;
Highlightable1(int v) {
this.v = v;
}
@Override int visibleValue() { return v*2; }
@Override int highlightableValue() { return v*3; }
}
class Selectable1 extends Selectable {
private int v;
Selectable1(int v) {
this.v = v;
}
@Override int visibleValue() { return v*4; }
@Override int highlightableValue() { return v*5; }
@Override int selectableValue() { return v*6; }
}
class Visible2 extends Visible {
private int v;
Visible2(int v) {
this.v = v;
}
@Override int visibleValue() { return v*7; }
}
class Highlightable2 extends Highlightable {
private int v;
Highlightable2(int v) {
this.v = v;
}
@Override int visibleValue() { return v*8; }
@Override int highlightableValue() { return v*9; }
}
class Selectable2 extends Selectable {
private int v;
Selectable2(int v) {
this.v = v;
}
@Override int visibleValue() { return v*10; }
@Override int highlightableValue() { return v*11; }
@Override int selectableValue() { return v*12; }
}
class Visible3 extends Visible {
private int v;
Visible3(int v) {
this.v = v;
}
@Override int visibleValue() { return v*13; }
}
class Highlightable3 extends Highlightable {
private int v;
Highlightable3(int v) {
this.v = v;
}
@Override int visibleValue() { return v*14; }
@Override int highlightableValue() { return v*15; }
}
class Selectable3 extends Selectable {
private int v;
Selectable3(int v) {
this.v = v;
}
@Override int visibleValue() { return v*16; }
@Override int highlightableValue() { return v*17; }
@Override int selectableValue() { return v*18; }
}
Benchmark output:
Main.separateLists thrpt 600 1690.522 ± 6.570 ops/s
Main.singleListAndInstanceof thrpt 600 1751.375 ± 4.368 ops/s
Main.separateLists:L1-dcache-load-misses:u thrpt 2 2298.258 #/op
Main.singleListAndInstanceof:L1-dcache-load-misses:u thrpt 2 627.451 #/op
Main.separateLists:L1-dcache-loads:u thrpt 2 1217756.290 #/op
Main.singleListAndInstanceof:L1-dcache-loads:u thrpt 2 1135982.650 #/op
Main.separateLists:L1-icache-load-misses:u thrpt 2 113.599 #/op
Main.singleListAndInstanceof:L1-icache-load-misses:u thrpt 2 99.896 #/op
Main.separateLists:L1-icache-loads:u thrpt 2 656048.382 #/op
Main.singleListAndInstanceof:L1-icache-loads:u thrpt 2 694074.004 #/op
Main.separateLists:LLC-load-misses:u thrpt 2 872.681 #/op
Main.singleListAndInstanceof:LLC-load-misses:u thrpt 2 355.666 #/op
Main.separateLists:LLC-loads:u thrpt 2 12036.496 #/op
Main.singleListAndInstanceof:LLC-loads:u thrpt 2 7445.434 #/op
Main.separateLists:LLC-stores:u thrpt 2 15277.223 #/op
Main.singleListAndInstanceof:LLC-stores:u thrpt 2 10649.517 #/op
Main.separateLists:branch-misses:u thrpt 2 22463.763 #/op
Main.singleListAndInstanceof:branch-misses:u thrpt 2 29940.958 #/op
Main.separateLists:branches:u thrpt 2 254018.586 #/op
Main.singleListAndInstanceof:branches:u thrpt 2 275450.951 #/op
Main.separateLists:cycles:u thrpt 2 1988517.839 #/op
Main.singleListAndInstanceof:cycles:u thrpt 2 1921584.057 #/op
Main.separateLists:dTLB-load-misses:u thrpt 2 66.212 #/op
Main.singleListAndInstanceof:dTLB-load-misses:u thrpt 2 64.442 #/op
Main.separateLists:dTLB-loads:u thrpt 2 1217929.340 #/op
Main.singleListAndInstanceof:dTLB-loads:u thrpt 2 1135799.981 #/op
Main.separateLists:iTLB-load-misses:u thrpt 2 4.179 #/op
Main.singleListAndInstanceof:iTLB-load-misses:u thrpt 2 3.876 #/op
Main.separateLists:iTLB-loads:u thrpt 2 656595.175 #/op
Main.singleListAndInstanceof:iTLB-loads:u thrpt 2 693913.010 #/op
Main.separateLists:instructions:u thrpt 2 2273646.245 #/op
Main.singleListAndInstanceof:instructions:u thrpt 2 2045332.939 #/op
Main.separateLists:stalled-cycles-backend:u thrpt 2 773671.154 #/op
Main.singleListAndInstanceof:stalled-cycles-backend:u thrpt 2 619477.824 #/op
Main.separateLists:stalled-cycles-frontend:u thrpt 2 184604.485 #/op
Main.singleListAndInstanceof:stalled-cycles-frontend:u thrpt 2 271938.450 #/op
Main.separateLists:·gc.alloc.rate thrpt 600 217.266 ± 0.846 MB/sec
Main.singleListAndInstanceof:·gc.alloc.rate thrpt 600 222.747 ± 0.556 MB/sec
Main.separateLists:·gc.alloc.rate.norm thrpt 600 202181.035 ± 2.986 B/op
Main.singleListAndInstanceof:·gc.alloc.rate.norm thrpt 600 200083.395 ± 4.720 B/op
Main.separateLists:·gc.churn.PS_Eden_Space thrpt 600 217.792 ± 3.841 MB/sec
Main.singleListAndInstanceof:·gc.churn.PS_Eden_Space thrpt 600 223.528 ± 4.973 MB/sec
Main.separateLists:·gc.churn.PS_Eden_Space.norm thrpt 600 202704.197 ± 3508.997 B/op
Main.singleListAndInstanceof:·gc.churn.PS_Eden_Space.norm thrpt 600 200804.794 ± 4414.457 B/op
Main.separateLists:·gc.churn.PS_Survivor_Space thrpt 600 0.095 ± 0.008 MB/sec
Main.singleListAndInstanceof:·gc.churn.PS_Survivor_Space thrpt 600 0.091 ± 0.008 MB/sec
Main.separateLists:·gc.churn.PS_Survivor_Space.norm thrpt 600 88.896 ± 7.778 B/op
Main.singleListAndInstanceof:·gc.churn.PS_Survivor_Space.norm thrpt 600 81.693 ± 7.269 B/op
Main.separateLists:·gc.count thrpt 600 2440.000 counts
Main.singleListAndInstanceof:·gc.count thrpt 600 2289.000 counts
Main.separateLists:·gc.time thrpt 600 4501.000 ms
Main.singleListAndInstanceof:·gc.time thrpt 600 4236.000 ms
UPDATE: Below is the benchmark code and results with array setup code extracted into separate methods and removed from the measurement. instanceof
is slower for that case, as expected - above differences are probably related to branch prediction issues in the list setup. (while those are interesting too, they probably should go into separate question)
package test;
import org.openjdk.jmh.annotations.*;
import java.util.ArrayList;
import java.util.Random;
public class Main {
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
@State(Scope.Thread)
public static class SeparateListsState {
public ArrayList<Visible> visibles;
public ArrayList<Highlightable> highlightables;
public ArrayList<Selectable> selectables;
@Setup(Level.Invocation)
public void doSetup() {
visibles = new ArrayList<>();
highlightables = new ArrayList<>();
selectables = new ArrayList<>();
Random random = new Random(9698426994L + 8879);
for (int i = 0; i < 10_000; i++) {
switch (random.nextInt(9)) {
case 0:
visibles.add(new Visible1(i));
break;
case 1:
highlightables.add(new Highlightable1(i));
break;
case 2:
selectables.add(new Selectable1(i));
break;
case 3:
visibles.add(new Visible2(i));
break;
case 4:
highlightables.add(new Highlightable2(i));
break;
case 5:
selectables.add(new Selectable2(i));
break;
case 6:
visibles.add(new Visible3(i));
break;
case 7:
highlightables.add(new Highlightable3(i));
break;
case 8:
selectables.add(new Selectable3(i));
break;
}
}
}
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 150, time = 1)
@Fork(value = 2)
public static long separateLists(SeparateListsState state) {
long vsum = 0;
long hsum = 0;
long ssum = 0;
for (Visible e : state.visibles) {
vsum += e.visibleValue();
}
for (Highlightable e : state.highlightables) {
vsum += e.visibleValue();
hsum += e.highlightableValue();
}
for (Selectable e : state.selectables) {
vsum += e.visibleValue();
hsum += e.highlightableValue();
ssum += e.selectableValue();
}
return vsum * hsum * ssum;
}
@State(Scope.Thread)
public static class SingleListAndInstanceofState {
public ArrayList<Visible> visibles;
@Setup(Level.Invocation)
public void doSetup() {
visibles = new ArrayList<>();
Random random = new Random(9698426994L + 8879);
for (int i = 0; i < 10_000; i++) {
switch (random.nextInt(9)) {
case 0:
visibles.add(new Visible1(i));
break;
case 1:
visibles.add(new Highlightable1(i));
break;
case 2:
visibles.add(new Selectable1(i));
break;
case 3:
visibles.add(new Visible2(i));
break;
case 4:
visibles.add(new Highlightable2(i));
break;
case 5:
visibles.add(new Selectable2(i));
break;
case 6:
visibles.add(new Visible3(i));
break;
case 7:
visibles.add(new Highlightable3(i));
break;
case 8:
visibles.add(new Selectable3(i));
break;
}
}
}
}
@Benchmark
@BenchmarkMode(Mode.Throughput)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 150, time = 1)
@Fork(value = 2)
public static long singleListAndInstanceof(SingleListAndInstanceofState state) {
long vsum = 0;
long hsum = 0;
long ssum = 0;
for (Visible e : state.visibles) {
if (e instanceof Selectable) {
vsum += e.visibleValue();
hsum += ((Selectable) e).highlightableValue();
ssum += ((Selectable) e).selectableValue();
} else if (e instanceof Highlightable) {
vsum += e.visibleValue();
hsum += ((Highlightable) e).highlightableValue();
} else {
vsum += e.visibleValue();
}
}
return vsum * hsum * ssum;
}
}
abstract class Visible {
abstract int visibleValue();
}
abstract class Highlightable extends Visible {
abstract int highlightableValue();
}
abstract class Selectable extends Highlightable {
abstract int selectableValue();
}
class Visible1 extends Visible {
private int v;
Visible1(int v) {
this.v = v;
}
@Override int visibleValue() { return v; }
}
class Highlightable1 extends Highlightable {
private int v;
Highlightable1(int v) {
this.v = v;
}
@Override int visibleValue() { return v*2; }
@Override int highlightableValue() { return v*3; }
}
class Selectable1 extends Selectable {
private int v;
Selectable1(int v) {
this.v = v;
}
@Override int visibleValue() { return v*4; }
@Override int highlightableValue() { return v*5; }
@Override int selectableValue() { return v*6; }
}
class Visible2 extends Visible {
private int v;
Visible2(int v) {
this.v = v;
}
@Override int visibleValue() { return v*7; }
}
class Highlightable2 extends Highlightable {
private int v;
Highlightable2(int v) {
this.v = v;
}
@Override int visibleValue() { return v*8; }
@Override int highlightableValue() { return v*9; }
}
class Selectable2 extends Selectable {
private int v;
Selectable2(int v) {
this.v = v;
}
@Override int visibleValue() { return v*10; }
@Override int highlightableValue() { return v*11; }
@Override int selectableValue() { return v*12; }
}
class Visible3 extends Visible {
private int v;
Visible3(int v) {
this.v = v;
}
@Override int visibleValue() { return v*13; }
}
class Highlightable3 extends Highlightable {
private int v;
Highlightable3(int v) {
this.v = v;
}
@Override int visibleValue() { return v*14; }
@Override int highlightableValue() { return v*15; }
}
class Selectable3 extends Selectable {
private int v;
Selectable3(int v) {
this.v = v;
}
@Override int visibleValue() { return v*16; }
@Override int highlightableValue() { return v*17; }
@Override int selectableValue() { return v*18; }
}
And the results:
Main.separateLists thrpt 300 4211.552 ± 23.791 ops/s
Main.singleListAndInstanceof thrpt 300 3920.251 ± 15.478 ops/s
Main.separateLists:L1-dcache-load-misses:u thrpt 2 3046.033 #/op
Main.singleListAndInstanceof:L1-dcache-load-misses:u thrpt 2 1089.122 #/op
Main.separateLists:L1-dcache-loads:u thrpt 2 1090745.006 #/op
Main.singleListAndInstanceof:L1-dcache-loads:u thrpt 2 1125243.609 #/op
Main.separateLists:L1-icache-load-misses:u thrpt 2 150.542 #/op
Main.singleListAndInstanceof:L1-icache-load-misses:u thrpt 2 143.304 #/op
Main.separateLists:L1-icache-loads:u thrpt 2 600852.620 #/op
Main.singleListAndInstanceof:L1-icache-loads:u thrpt 2 700771.042 #/op
Main.separateLists:LLC-load-misses:u thrpt 2 1299.520 #/op
Main.singleListAndInstanceof:LLC-load-misses:u thrpt 2 636.764 #/op
Main.separateLists:LLC-loads:u thrpt 2 14408.815 #/op
Main.singleListAndInstanceof:LLC-loads:u thrpt 2 10429.768 #/op
Main.separateLists:LLC-stores:u thrpt 2 18999.178 #/op
Main.singleListAndInstanceof:LLC-stores:u thrpt 2 15370.582 #/op
Main.separateLists:branch-misses:u thrpt 2 22578.062 #/op
Main.singleListAndInstanceof:branch-misses:u thrpt 2 29257.959 #/op
Main.separateLists:branches:u thrpt 2 258026.890 #/op
Main.singleListAndInstanceof:branches:u thrpt 2 284911.889 #/op
Main.separateLists:cycles:u thrpt 2 1915774.770 #/op
Main.singleListAndInstanceof:cycles:u thrpt 2 1974841.023 #/op
Main.separateLists:dTLB-load-misses:u thrpt 2 101.573 #/op
Main.singleListAndInstanceof:dTLB-load-misses:u thrpt 2 99.982 #/op
Main.separateLists:dTLB-loads:u thrpt 2 1090174.103 #/op
Main.singleListAndInstanceof:dTLB-loads:u thrpt 2 1129185.929 #/op
Main.separateLists:iTLB-load-misses:u thrpt 2 4.432 #/op
Main.singleListAndInstanceof:iTLB-load-misses:u thrpt 2 3.955 #/op
Main.separateLists:iTLB-loads:u thrpt 2 600301.665 #/op
Main.singleListAndInstanceof:iTLB-loads:u thrpt 2 703339.482 #/op
Main.separateLists:instructions:u thrpt 2 1974603.052 #/op
Main.singleListAndInstanceof:instructions:u thrpt 2 2040460.093 #/op
Main.separateLists:stalled-cycles-backend:u thrpt 2 808914.974 #/op
Main.singleListAndInstanceof:stalled-cycles-backend:u thrpt 2 685615.056 #/op
Main.separateLists:stalled-cycles-frontend:u thrpt 2 186013.216 #/op
Main.singleListAndInstanceof:stalled-cycles-frontend:u thrpt 2 272207.204 #/op
Main.separateLists:·gc.alloc.rate thrpt 300 346.891 ± 1.166 MB/sec
Main.singleListAndInstanceof:·gc.alloc.rate thrpt 300 358.297 ± 0.614 MB/sec
Main.separateLists:·gc.alloc.rate.norm thrpt 300 310744.294 ± 0.107 B/op
Main.singleListAndInstanceof:·gc.alloc.rate.norm thrpt 300 328992.302 ± 0.110 B/op
Main.separateLists:·gc.churn.PS_Eden_Space thrpt 300 349.387 ± 14.305 MB/sec
Main.singleListAndInstanceof:·gc.churn.PS_Eden_Space thrpt 300 360.039 ± 9.075 MB/sec
Main.separateLists:·gc.churn.PS_Eden_Space.norm thrpt 300 313154.953 ± 13018.012 B/op
Main.singleListAndInstanceof:·gc.churn.PS_Eden_Space.norm thrpt 300 330629.833 ± 8345.712 B/op
Main.separateLists:·gc.churn.PS_Survivor_Space thrpt 300 0.092 ± 0.012 MB/sec
Main.singleListAndInstanceof:·gc.churn.PS_Survivor_Space thrpt 300 0.094 ± 0.011 MB/sec
Main.separateLists:·gc.churn.PS_Survivor_Space.norm thrpt 300 82.348 ± 10.661 B/op
Main.singleListAndInstanceof:·gc.churn.PS_Survivor_Space.norm thrpt 300 86.465 ± 10.417 B/op
Main.separateLists:·gc.count thrpt 300 1196.000 counts
Main.singleListAndInstanceof:·gc.count thrpt 300 1235.000 counts
Main.separateLists:·gc.time thrpt 300 2178.000 ms
Main.singleListAndInstanceof:·gc.time thrpt 300 2355.000 ms
L1-dcache-load-misses
andLLC-load-misses
are much more frequent with the separate list case. Butbranch-misses
are more frequent for instanceof case. – Gastrulation@Setup
because you are also measuring it. I would avoid usingRandom
(for example, replacingRandom.nextInt(9)
byi % 9
) because you don't know if the two case are really similar: sure, you have 10k items, but you don't know if there are more Selectable, Highlightable or other kind, although I don't know if it explain the differences. You could also enable other measurement such as average (... I admit that throughput led me astray :o) – Analemmai % 9
- I think that will be much worse than Random, because the branch predictor may quickly pick up on the fact that branches are selected in a specific order (see Two-level predictors, en.wikipedia.org/wiki/Branch_predictor#Two-level_predictor). Regarding replacing throughput with average - I tried that, but my jmh version formats the results as10^-4
answer, same for all iterations and benchmarks, and that is not very useful. – Gastrulation