Why instanceof and iterating single list is faster than several specialized lists?
Asked Answered
G

2

7

I assumed that separating objects that implement different interfaces into several lists and iterating those lists afterwards would be faster than dumping all objects into a single list and then switching via instanceof. E.g. this:

ArrayList<Visible> visibles = new ArrayList<>();
ArrayList<Highlightable> highlightables = new ArrayList<>();
ArrayList<Selectable> selectables = new ArrayList<>();

// populate the lists
// Visible is an interface, Highlightable is also interface (extends Visible),
// Selectable extends Highlightable
// All interfaces have 3 concrete subclasses each,
// to test situations when JVM is not able to optimize too much due to small number of classes

for (Visible e : visibles) {
    vsum += e.visibleValue();
}
for (Highlightable e : highlightables) {
    vsum += e.visibleValue();
    hsum += e.highlightableValue();
}
for (Selectable e : selectables) {
    vsum += e.visibleValue();
    hsum += e.highlightableValue();
    ssum += e.selectableValue();
}

should be faster than

ArrayList<Visible> visibles = new ArrayList<>();

// populate the list

for (Visible e : visibles) {
    if (e instanceof Selectable) {
        vsum += e.visibleValue();
        hsum += ((Selectable) e).highlightableValue();
        ssum += ((Selectable) e).selectableValue();
    } else if (e instanceof Highlightable) {
        vsum += e.visibleValue();
        hsum += ((Highlightable) e).highlightableValue();
    } else {
        vsum += e.visibleValue();
    }
}

However it doesn't seem to be the case:

Main.separateLists             thrpt   30     1546.898 ±     32.312   ops/s
Main.singleListAndInstanceof   thrpt   30     1673.733 ±     29.804   ops/s

I added full source for the benchmark below.

What could be the cause of instanceof version being faster? Even if we assume that isntanceof instruction is free, then both versions should at least be equal in performance (because they still add elements to list and iterate them).

package test;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Warmup;

import java.util.ArrayList;
import java.util.Random;

public class Main {

    public static void main(String[] args) throws Exception {
        org.openjdk.jmh.Main.main(args);
    }

    @Benchmark
    @Warmup(iterations = 5, time = 1)
    @Measurement(iterations = 15, time = 1)
    @Fork(value = 2)
    public static long separateLists() {
        ArrayList<Visible> visibles = new ArrayList<>(3_500);
        ArrayList<Highlightable> highlightables = new ArrayList<>(3_500);
        ArrayList<Selectable> selectables = new ArrayList<>(3_500);

        Random random = new Random();
        for (int i = 0; i < 10_000; i++) {
            switch (random.nextInt(9)) {
                case 0:
                    visibles.add(new Visible1(i));
                    break;
                case 1:
                    highlightables.add(new Highlightable1(i));
                    break;
                case 2:
                    selectables.add(new Selectable1(i));
                    break;

                case 3:
                    visibles.add(new Visible2(i));
                    break;
                case 4:
                    highlightables.add(new Highlightable2(i));
                    break;
                case 5:
                    selectables.add(new Selectable2(i));
                    break;

                case 6:
                    visibles.add(new Visible3(i));
                    break;
                case 7:
                    highlightables.add(new Highlightable3(i));
                    break;
                case 8:
                    selectables.add(new Selectable3(i));
                    break;
            }
        }

        long listSize = visibles.size() + highlightables.size() + selectables.size();

        long vsum = 0;
        long hsum = 0;
        long ssum = 0;
        for (Visible e : visibles) {
            vsum += e.visibleValue();
        }
        for (Highlightable e : highlightables) {
            vsum += e.visibleValue();
            hsum += e.highlightableValue();
        }
        for (Selectable e : selectables) {
            vsum += e.visibleValue();
            hsum += e.highlightableValue();
            ssum += e.selectableValue();
        }

        return listSize + vsum * hsum * ssum;
    }

    @Benchmark
    @Warmup(iterations = 5, time = 1)
    @Measurement(iterations = 15, time = 1)
    @Fork(value = 2)
    public static long singleListAndInstanceof() {
        ArrayList<Visible> visibles = new ArrayList<>(10_000);

        Random random = new Random();
        for (int i = 0; i < 10_000; i++) {
            switch (random.nextInt(9)) {
                case 0:
                    visibles.add(new Visible1(i));
                    break;
                case 1:
                    visibles.add(new Highlightable1(i));
                    break;
                case 2:
                    visibles.add(new Selectable1(i));
                    break;

                case 3:
                    visibles.add(new Visible2(i));
                    break;
                case 4:
                    visibles.add(new Highlightable2(i));
                    break;
                case 5:
                    visibles.add(new Selectable2(i));
                    break;

                case 6:
                    visibles.add(new Visible3(i));
                    break;
                case 7:
                    visibles.add(new Highlightable3(i));
                    break;
                case 8:
                    visibles.add(new Selectable3(i));
                    break;
            }
        }

        long listSize = visibles.size();

        long vsum = 0;
        long hsum = 0;
        long ssum = 0;
        for (Visible e : visibles) {
            if (e instanceof Selectable) {
                vsum += e.visibleValue();
                hsum += ((Selectable) e).highlightableValue();
                ssum += ((Selectable) e).selectableValue();
            } else if (e instanceof Highlightable) {
                vsum += e.visibleValue();
                hsum += ((Highlightable) e).highlightableValue();
            } else {
                vsum += e.visibleValue();
            }
        }

        return listSize + vsum * hsum * ssum;
    }
}

abstract class Visible {
    abstract int visibleValue();
}
abstract class Highlightable extends Visible {
    abstract int highlightableValue();
}
abstract class Selectable extends Highlightable {
    abstract int selectableValue();
}

class Visible1 extends Visible {
    private int v;
    Visible1(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v; }
}
class Highlightable1 extends Highlightable {
    private int v;
    Highlightable1(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*2; }
    @Override int highlightableValue() { return v*3; }
}
class Selectable1 extends Selectable {
    private int v;
    Selectable1(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*4; }
    @Override int highlightableValue() { return v*5; }
    @Override int selectableValue() { return v*6; }
}

class Visible2 extends Visible {
    private int v;
    Visible2(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*7; }
}
class Highlightable2 extends Highlightable {
    private int v;
    Highlightable2(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*8; }
    @Override int highlightableValue() { return v*9; }
}
class Selectable2 extends Selectable {
    private int v;
    Selectable2(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*10; }
    @Override int highlightableValue() { return v*11; }
    @Override int selectableValue() { return v*12; }
}


class Visible3 extends Visible {
    private int v;
    Visible3(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*13; }
}
class Highlightable3 extends Highlightable {
    private int v;
    Highlightable3(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*14; }
    @Override int highlightableValue() { return v*15; }
}
class Selectable3 extends Selectable {
    private int v;
    Selectable3(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*16; }
    @Override int highlightableValue() { return v*17; }
    @Override int selectableValue() { return v*18; }
}

Benchmark output:

Main.separateLists                                             thrpt  600     1690.522 ±    6.570   ops/s
Main.singleListAndInstanceof                                   thrpt  600     1751.375 ±    4.368   ops/s
Main.separateLists:L1-dcache-load-misses:u                     thrpt    2     2298.258               #/op
Main.singleListAndInstanceof:L1-dcache-load-misses:u           thrpt    2      627.451               #/op
Main.separateLists:L1-dcache-loads:u                           thrpt    2  1217756.290               #/op
Main.singleListAndInstanceof:L1-dcache-loads:u                 thrpt    2  1135982.650               #/op
Main.separateLists:L1-icache-load-misses:u                     thrpt    2      113.599               #/op
Main.singleListAndInstanceof:L1-icache-load-misses:u           thrpt    2       99.896               #/op
Main.separateLists:L1-icache-loads:u                           thrpt    2   656048.382               #/op
Main.singleListAndInstanceof:L1-icache-loads:u                 thrpt    2   694074.004               #/op
Main.separateLists:LLC-load-misses:u                           thrpt    2      872.681               #/op
Main.singleListAndInstanceof:LLC-load-misses:u                 thrpt    2      355.666               #/op
Main.separateLists:LLC-loads:u                                 thrpt    2    12036.496               #/op
Main.singleListAndInstanceof:LLC-loads:u                       thrpt    2     7445.434               #/op
Main.separateLists:LLC-stores:u                                thrpt    2    15277.223               #/op
Main.singleListAndInstanceof:LLC-stores:u                      thrpt    2    10649.517               #/op
Main.separateLists:branch-misses:u                             thrpt    2    22463.763               #/op
Main.singleListAndInstanceof:branch-misses:u                   thrpt    2    29940.958               #/op
Main.separateLists:branches:u                                  thrpt    2   254018.586               #/op
Main.singleListAndInstanceof:branches:u                        thrpt    2   275450.951               #/op
Main.separateLists:cycles:u                                    thrpt    2  1988517.839               #/op
Main.singleListAndInstanceof:cycles:u                          thrpt    2  1921584.057               #/op
Main.separateLists:dTLB-load-misses:u                          thrpt    2       66.212               #/op
Main.singleListAndInstanceof:dTLB-load-misses:u                thrpt    2       64.442               #/op
Main.separateLists:dTLB-loads:u                                thrpt    2  1217929.340               #/op
Main.singleListAndInstanceof:dTLB-loads:u                      thrpt    2  1135799.981               #/op
Main.separateLists:iTLB-load-misses:u                          thrpt    2        4.179               #/op
Main.singleListAndInstanceof:iTLB-load-misses:u                thrpt    2        3.876               #/op
Main.separateLists:iTLB-loads:u                                thrpt    2   656595.175               #/op
Main.singleListAndInstanceof:iTLB-loads:u                      thrpt    2   693913.010               #/op
Main.separateLists:instructions:u                              thrpt    2  2273646.245               #/op
Main.singleListAndInstanceof:instructions:u                    thrpt    2  2045332.939               #/op
Main.separateLists:stalled-cycles-backend:u                    thrpt    2   773671.154               #/op
Main.singleListAndInstanceof:stalled-cycles-backend:u          thrpt    2   619477.824               #/op
Main.separateLists:stalled-cycles-frontend:u                   thrpt    2   184604.485               #/op
Main.singleListAndInstanceof:stalled-cycles-frontend:u         thrpt    2   271938.450               #/op
Main.separateLists:·gc.alloc.rate                              thrpt  600      217.266 ±    0.846  MB/sec
Main.singleListAndInstanceof:·gc.alloc.rate                    thrpt  600      222.747 ±    0.556  MB/sec
Main.separateLists:·gc.alloc.rate.norm                         thrpt  600   202181.035 ±    2.986    B/op
Main.singleListAndInstanceof:·gc.alloc.rate.norm               thrpt  600   200083.395 ±    4.720    B/op
Main.separateLists:·gc.churn.PS_Eden_Space                     thrpt  600      217.792 ±    3.841  MB/sec
Main.singleListAndInstanceof:·gc.churn.PS_Eden_Space           thrpt  600      223.528 ±    4.973  MB/sec
Main.separateLists:·gc.churn.PS_Eden_Space.norm                thrpt  600   202704.197 ± 3508.997    B/op
Main.singleListAndInstanceof:·gc.churn.PS_Eden_Space.norm      thrpt  600   200804.794 ± 4414.457    B/op
Main.separateLists:·gc.churn.PS_Survivor_Space                 thrpt  600        0.095 ±    0.008  MB/sec
Main.singleListAndInstanceof:·gc.churn.PS_Survivor_Space       thrpt  600        0.091 ±    0.008  MB/sec
Main.separateLists:·gc.churn.PS_Survivor_Space.norm            thrpt  600       88.896 ±    7.778    B/op
Main.singleListAndInstanceof:·gc.churn.PS_Survivor_Space.norm  thrpt  600       81.693 ±    7.269    B/op
Main.separateLists:·gc.count                                   thrpt  600     2440.000             counts
Main.singleListAndInstanceof:·gc.count                         thrpt  600     2289.000             counts
Main.separateLists:·gc.time                                    thrpt  600     4501.000                 ms
Main.singleListAndInstanceof:·gc.time                          thrpt  600     4236.000                 ms

UPDATE: Below is the benchmark code and results with array setup code extracted into separate methods and removed from the measurement. instanceof is slower for that case, as expected - above differences are probably related to branch prediction issues in the list setup. (while those are interesting too, they probably should go into separate question)

package test;

import org.openjdk.jmh.annotations.*;

import java.util.ArrayList;
import java.util.Random;

public class Main {

    public static void main(String[] args) throws Exception {
        org.openjdk.jmh.Main.main(args);
    }

    @State(Scope.Thread)
    public static class SeparateListsState {
        public ArrayList<Visible> visibles;
        public ArrayList<Highlightable> highlightables;
        public ArrayList<Selectable> selectables;

        @Setup(Level.Invocation)
        public void doSetup() {
            visibles = new ArrayList<>();
            highlightables = new ArrayList<>();
            selectables = new ArrayList<>();

            Random random = new Random(9698426994L + 8879);
            for (int i = 0; i < 10_000; i++) {
                switch (random.nextInt(9)) {
                    case 0:
                        visibles.add(new Visible1(i));
                        break;
                    case 1:
                        highlightables.add(new Highlightable1(i));
                        break;
                    case 2:
                        selectables.add(new Selectable1(i));
                        break;

                    case 3:
                        visibles.add(new Visible2(i));
                        break;
                    case 4:
                        highlightables.add(new Highlightable2(i));
                        break;
                    case 5:
                        selectables.add(new Selectable2(i));
                        break;

                    case 6:
                        visibles.add(new Visible3(i));
                        break;
                    case 7:
                        highlightables.add(new Highlightable3(i));
                        break;
                    case 8:
                        selectables.add(new Selectable3(i));
                        break;
                }
            }
        }
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @Warmup(iterations = 5, time = 1)
    @Measurement(iterations = 150, time = 1)
    @Fork(value = 2)
    public static long separateLists(SeparateListsState state) {
        long vsum = 0;
        long hsum = 0;
        long ssum = 0;
        for (Visible e : state.visibles) {
            vsum += e.visibleValue();
        }
        for (Highlightable e : state.highlightables) {
            vsum += e.visibleValue();
            hsum += e.highlightableValue();
        }
        for (Selectable e : state.selectables) {
            vsum += e.visibleValue();
            hsum += e.highlightableValue();
            ssum += e.selectableValue();
        }

        return vsum * hsum * ssum;
    }

    @State(Scope.Thread)
    public static class SingleListAndInstanceofState {
        public ArrayList<Visible> visibles;

        @Setup(Level.Invocation)
        public void doSetup() {
            visibles = new ArrayList<>();

            Random random = new Random(9698426994L + 8879);
            for (int i = 0; i < 10_000; i++) {
                switch (random.nextInt(9)) {
                    case 0:
                        visibles.add(new Visible1(i));
                        break;
                    case 1:
                        visibles.add(new Highlightable1(i));
                        break;
                    case 2:
                        visibles.add(new Selectable1(i));
                        break;

                    case 3:
                        visibles.add(new Visible2(i));
                        break;
                    case 4:
                        visibles.add(new Highlightable2(i));
                        break;
                    case 5:
                        visibles.add(new Selectable2(i));
                        break;

                    case 6:
                        visibles.add(new Visible3(i));
                        break;
                    case 7:
                        visibles.add(new Highlightable3(i));
                        break;
                    case 8:
                        visibles.add(new Selectable3(i));
                        break;
                }
            }
        }
    }

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    @Warmup(iterations = 5, time = 1)
    @Measurement(iterations = 150, time = 1)
    @Fork(value = 2)
    public static long singleListAndInstanceof(SingleListAndInstanceofState state) {
        long vsum = 0;
        long hsum = 0;
        long ssum = 0;
        for (Visible e : state.visibles) {
            if (e instanceof Selectable) {
                vsum += e.visibleValue();
                hsum += ((Selectable) e).highlightableValue();
                ssum += ((Selectable) e).selectableValue();
            } else if (e instanceof Highlightable) {
                vsum += e.visibleValue();
                hsum += ((Highlightable) e).highlightableValue();
            } else {
                vsum += e.visibleValue();
            }
        }

        return vsum * hsum * ssum;
    }
}

abstract class Visible {
    abstract int visibleValue();
}
abstract class Highlightable extends Visible {
    abstract int highlightableValue();
}
abstract class Selectable extends Highlightable {
    abstract int selectableValue();
}

class Visible1 extends Visible {
    private int v;
    Visible1(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v; }
}
class Highlightable1 extends Highlightable {
    private int v;
    Highlightable1(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*2; }
    @Override int highlightableValue() { return v*3; }
}
class Selectable1 extends Selectable {
    private int v;
    Selectable1(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*4; }
    @Override int highlightableValue() { return v*5; }
    @Override int selectableValue() { return v*6; }
}

class Visible2 extends Visible {
    private int v;
    Visible2(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*7; }
}
class Highlightable2 extends Highlightable {
    private int v;
    Highlightable2(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*8; }
    @Override int highlightableValue() { return v*9; }
}
class Selectable2 extends Selectable {
    private int v;
    Selectable2(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*10; }
    @Override int highlightableValue() { return v*11; }
    @Override int selectableValue() { return v*12; }
}


class Visible3 extends Visible {
    private int v;
    Visible3(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*13; }
}
class Highlightable3 extends Highlightable {
    private int v;
    Highlightable3(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*14; }
    @Override int highlightableValue() { return v*15; }
}
class Selectable3 extends Selectable {
    private int v;
    Selectable3(int v) {
        this.v = v;
    }
    @Override int visibleValue() { return v*16; }
    @Override int highlightableValue() { return v*17; }
    @Override int selectableValue() { return v*18; }
}

And the results:

Main.separateLists                                             thrpt  300     4211.552 ±    23.791   ops/s
Main.singleListAndInstanceof                                   thrpt  300     3920.251 ±    15.478   ops/s
Main.separateLists:L1-dcache-load-misses:u                     thrpt    2     3046.033                #/op
Main.singleListAndInstanceof:L1-dcache-load-misses:u           thrpt    2     1089.122                #/op
Main.separateLists:L1-dcache-loads:u                           thrpt    2  1090745.006                #/op
Main.singleListAndInstanceof:L1-dcache-loads:u                 thrpt    2  1125243.609                #/op
Main.separateLists:L1-icache-load-misses:u                     thrpt    2      150.542                #/op
Main.singleListAndInstanceof:L1-icache-load-misses:u           thrpt    2      143.304                #/op
Main.separateLists:L1-icache-loads:u                           thrpt    2   600852.620                #/op
Main.singleListAndInstanceof:L1-icache-loads:u                 thrpt    2   700771.042                #/op
Main.separateLists:LLC-load-misses:u                           thrpt    2     1299.520                #/op
Main.singleListAndInstanceof:LLC-load-misses:u                 thrpt    2      636.764                #/op
Main.separateLists:LLC-loads:u                                 thrpt    2    14408.815                #/op
Main.singleListAndInstanceof:LLC-loads:u                       thrpt    2    10429.768                #/op
Main.separateLists:LLC-stores:u                                thrpt    2    18999.178                #/op
Main.singleListAndInstanceof:LLC-stores:u                      thrpt    2    15370.582                #/op
Main.separateLists:branch-misses:u                             thrpt    2    22578.062                #/op
Main.singleListAndInstanceof:branch-misses:u                   thrpt    2    29257.959                #/op
Main.separateLists:branches:u                                  thrpt    2   258026.890                #/op
Main.singleListAndInstanceof:branches:u                        thrpt    2   284911.889                #/op
Main.separateLists:cycles:u                                    thrpt    2  1915774.770                #/op
Main.singleListAndInstanceof:cycles:u                          thrpt    2  1974841.023                #/op
Main.separateLists:dTLB-load-misses:u                          thrpt    2      101.573                #/op
Main.singleListAndInstanceof:dTLB-load-misses:u                thrpt    2       99.982                #/op
Main.separateLists:dTLB-loads:u                                thrpt    2  1090174.103                #/op
Main.singleListAndInstanceof:dTLB-loads:u                      thrpt    2  1129185.929                #/op
Main.separateLists:iTLB-load-misses:u                          thrpt    2        4.432                #/op
Main.singleListAndInstanceof:iTLB-load-misses:u                thrpt    2        3.955                #/op
Main.separateLists:iTLB-loads:u                                thrpt    2   600301.665                #/op
Main.singleListAndInstanceof:iTLB-loads:u                      thrpt    2   703339.482                #/op
Main.separateLists:instructions:u                              thrpt    2  1974603.052                #/op
Main.singleListAndInstanceof:instructions:u                    thrpt    2  2040460.093                #/op
Main.separateLists:stalled-cycles-backend:u                    thrpt    2   808914.974                #/op
Main.singleListAndInstanceof:stalled-cycles-backend:u          thrpt    2   685615.056                #/op
Main.separateLists:stalled-cycles-frontend:u                   thrpt    2   186013.216                #/op
Main.singleListAndInstanceof:stalled-cycles-frontend:u         thrpt    2   272207.204                #/op
Main.separateLists:·gc.alloc.rate                              thrpt  300      346.891 ±     1.166  MB/sec
Main.singleListAndInstanceof:·gc.alloc.rate                    thrpt  300      358.297 ±     0.614  MB/sec
Main.separateLists:·gc.alloc.rate.norm                         thrpt  300   310744.294 ±     0.107    B/op
Main.singleListAndInstanceof:·gc.alloc.rate.norm               thrpt  300   328992.302 ±     0.110    B/op
Main.separateLists:·gc.churn.PS_Eden_Space                     thrpt  300      349.387 ±    14.305  MB/sec
Main.singleListAndInstanceof:·gc.churn.PS_Eden_Space           thrpt  300      360.039 ±     9.075  MB/sec
Main.separateLists:·gc.churn.PS_Eden_Space.norm                thrpt  300   313154.953 ± 13018.012    B/op
Main.singleListAndInstanceof:·gc.churn.PS_Eden_Space.norm      thrpt  300   330629.833 ±  8345.712    B/op
Main.separateLists:·gc.churn.PS_Survivor_Space                 thrpt  300        0.092 ±     0.012  MB/sec
Main.singleListAndInstanceof:·gc.churn.PS_Survivor_Space       thrpt  300        0.094 ±     0.011  MB/sec
Main.separateLists:·gc.churn.PS_Survivor_Space.norm            thrpt  300       82.348 ±    10.661    B/op
Main.singleListAndInstanceof:·gc.churn.PS_Survivor_Space.norm  thrpt  300       86.465 ±    10.417    B/op
Main.separateLists:·gc.count                                   thrpt  300     1196.000              counts
Main.singleListAndInstanceof:·gc.count                         thrpt  300     1235.000              counts
Main.separateLists:·gc.time                                    thrpt  300     2178.000                  ms
Main.singleListAndInstanceof:·gc.time                          thrpt  300     2355.000                  ms
Gastrulation answered 26/6, 2019 at 14:24 Comment(12)
Why did you assume that iterating 3 lists should be faster than iterating just one?Dissipate
@MarcoMarchetti - Clarification: the total size of these 3 lists is equal to the size of that one list. So I assumed that time for iterating those 3 lists should be also equal to the time taken for the single unified list.Gastrulation
@MarcoMarchetti - Furthermore, each of 3 lists contains elements with types known ahead of time, so for each loop JVM should be able to remove all the checks and generate specialized code. While in the case of 1 list all objects are clumped into one basket and JVM has to check the type of the each element while iterating.Gastrulation
I'd go for "caching and branch prediction" as the usual suspects for such (negligible) differences...Overmaster
Number of iterations seems to be too small for JIT to start due any smart optimizations. I would suggest to run benchmark for couple of minutes.Roughcast
@AlexeyRagozin - no, that didn't change the results (I ran the benchmark for 30 minutes). I added the benchmark output to the question.Gastrulation
@Overmaster - Yes, but I can't understand why and how exactly caching and branch prediction should affect the runtimes here. Array accesses are all strictly linear, no jumping around, relevant class metadata should be always in the hot cache, and branches should be nicely randomized (defeating the prediction completely).Gastrulation
@Overmaster - I attached benchmark output with prof data. It seems that L1-dcache-load-misses and LLC-load-misses are much more frequent with the separate list case. But branch-misses are more frequent for instanceof case.Gastrulation
That cache+branch comment was just a wild guess (and I hope that someone who knows more than me can draw profound conclusions from the information that you added). As an unrelated side note: It'd mildly irritating (to say the least) to see what crappy and boring questions occasionally receive upvotes, while this one flatlined for 6 hours straight. The question is clear, interesting, shows considerable research and preparation effort, and so it now at least has one upvote...Overmaster
You should separate the list generation into a specific method with @Setup because you are also measuring it. I would avoid using Random (for example, replacing Random.nextInt(9) by i % 9) because you don't know if the two case are really similar: sure, you have 10k items, but you don't know if there are more Selectable, Highlightable or other kind, although I don't know if it explain the differences. You could also enable other measurement such as average (... I admit that throughput led me astray :o)Analemma
@Analemma - Yes, you are right, extracting the list generation does give different results (see the question update). Regarding i % 9 - I think that will be much worse than Random, because the branch predictor may quickly pick up on the fact that branches are selected in a specific order (see Two-level predictors, en.wikipedia.org/wiki/Branch_predictor#Two-level_predictor). Regarding replacing throughput with average - I tried that, but my jmh version formats the results as 10^-4 answer, same for all iterations and benchmarks, and that is not very useful.Gastrulation
Both are O(n). ¯_(ツ)_/¯Harr
A
8

As NoDataFound suggests in the comments, you're not just comparing the performance of iterating through the list, you're also comparing the list population methods. You need to pull this part of your code into a setup method - otherwise you're potentially going to be impacted by resize operations on your three ArrayList instances (amongst other things).

You should also either scrap the use of Random to populate the list, or at least instantiate it with the same seed across both implementations - otherwise you're not creating a repeatable order of elements across both implementations.

Adjudication answered 26/6, 2019 at 23:28 Comment(1)
Yes, that seems to be the correct guess - when I extracted the list population methods the instanceof case became slower, as expected (see the question update).Gastrulation
V
6

The core point is: your measuring is flawed. Not only did that first version of the your question measure different "things", but even the updated question has one big problem, the (way too low) sample size of 10_000!

That is simply not a reasonable sample size. At least, not alone. You should rather check out what you see for 10K, 50K, 100K, 1 million, ... loop iterations.

You see, you make a mistake that many people make with Java: they assume that this or that construct on the source code side of things determines performance.

And that is only partially true. You see, the real performance kicks come out of the myriad optimisations that the JIT compiler within the JVM will do at runtime, based on the profiling information it collected.

I think, the default threshold for the JIT to kick in and translate bytecode into highly optimized machine code is like 90K(?) method invocations/loop iterations.

Please let that sink in: unless your code does something on the scale of 100K or more, the JIT doesn't even consider your code "worth optimising". But then it will kick in, and that can have dramatic effects on overall performance. And then, what you put into your source code ... might not matter much any more.

Thus there isn't a single measurement that tells you what the "best" solution is. That rather depends on the number of iterations you go through.

Thus, the real answer is: java performance benchmarking is hard, and you have to be extremely diligent about what you do, and the conclusions you draw from your results.

Beyond that, the real real answer is: performance is a luxury problem. Of course, one should avoid stupid mistakes that burn CPU cycles for nothing. But besides that, your primary goal is always to write simple code that is easy to read and understand. And then, when you notice "this feels sluggish" or "this doesn't meet our SLAs", then you carefully define experiments to measure what is going on, to identify that piece of code that causes the performance problem. And just for the record: you know which code the JIT can optimise the best ... surprise: simple straight forward code, that looks like code 90% of good java coders tend to write.

Vidicon answered 27/6, 2019 at 7:24 Comment(3)
Yes, I agree. One nitpick though - that 10_000 was indeed being run 1M times (1500 times per second, for 15 minutes), that gives more than 10^10 iterations in total - that should be enough for optimizer?Gastrulation
@Gastrulation Definitely ;-)Vidicon
…and the biggest mistake is to assume that the element types are known in one variant. This is generic code, subject to type erasure, so all variants contain type casts. A sequence of instanceof followed by a type cast usually ends up in one operation, but the fundamental difference between the two variants is that the elements are processed in different order. The multi-list variant basically processes them sorted by type, following the same type specific code path repeatedly, only changing two times, whereas the other variant runs its different code paths in pseudo random order.Need

© 2022 - 2024 — McMap. All rights reserved.