Benchmarking Java HashMap Get (JMH vs Looping)
Asked Answered
D

1

10

My ultimate goal is to create a comprehensive set of benchmarks for several Java primitive collection libraries using the standard Java collections as a baseline. In the past I have used the looping method of writing these kinds of micro-benchmarks. I put the function I am benchmarking in a loop and iterate 1 million+ times so the jit has a chance to warmup. I take the total time of the loop and then divide by the number of iterations to get an estimate for the amount of time a single call to the function I am benchmarking would take. After recently reading about the JMH project and specifically this example: JMHSample_11_Loops I see the issue with this approach.

My Machine:

Windows 7 64-bit
Core i7-2760QM @ 2.40 GHz
8.00 GB Ram
jdk1.7.0_45 64-bit

Here is a stripped down simple example of the looping method code described above:

    public static void main(String[] args) {
    HashMap<Long, Long> hmap = new HashMap<Long, Long>();
    long val = 0;

    //populating the hashmap
    for (long idx = 0; idx < 10000000; idx++) {
        hmap.put(idx, idx);
    }


    Stopwatch s = Stopwatch.createStarted();
    long x = 0;
    for (long idx = 0; idx < 10000000; idx++) {
       x =  hmap.get(idx);
    }
    s.stop();
    System.out.println(s); //5.522 s
    System.out.println(x); //9999999

    //5.522 seconds / 10000000 = 552.2 nanoseconds
}

Here is my attempt at rewriting this benchmark using JMH:

package com.test.benchmarks;

import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.HashMap;
import java.util.concurrent.TimeUnit;


@State(Scope.Thread)
public class MyBenchmark {


    private HashMap<Long, Long> hmap = new HashMap<Long, Long>();
    private long key;

    @Setup(Level.Iteration)
    public void setup(){

        key = 0;

        for(long i = 0; i < 10000000; i++) {
            hmap.put(i, i);
        }
    }


    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.NANOSECONDS)
    public long testGetExistingKey() throws InterruptedException{

        if(key >= 10000000) key=0;
        return hmap.get(key++);
    }


    public static void main(String[] args) throws RunnerException {

        Options opt = new OptionsBuilder()
                .include(".*" + MyBenchmark.class.getSimpleName() + ".*")
                .warmupIterations(5)
                .measurementIterations(25)
                .forks(1)
                .build();

        new Runner(opt).run();

    }

}

Here are the results:

 Result: 31.163 ±(99.9%) 11.732 ns/op [Average]
   Statistics: (min, avg, max) = (0.000, 31.163, 939008.000), stdev = 1831.428
   Confidence interval (99.9%): [19.431, 42.895]
  Samples, N = 263849
        mean =     31.163 ±(99.9%) 11.732 ns/op
         min =      0.000 ns/op
  p( 0.0000) =      0.000 ns/op
  p(50.0000) =      0.000 ns/op
  p(90.0000) =      0.000 ns/op
  p(95.0000) =    427.000 ns/op
  p(99.0000) =    428.000 ns/op
  p(99.9000) =    428.000 ns/op
  p(99.9900) =    856.000 ns/op
  p(99.9990) =   9198.716 ns/op
  p(99.9999) = 939008.000 ns/op
         max = 939008.000 ns/op


# Run complete. Total time: 00:02:07

Benchmark                                Mode   Samples        Score  Score error    Units
c.t.b.MyBenchmark.testGetExistingKey   sample    263849       31.163       11.732    ns/op

As far as I can tell, the same benchmark in JMH has hashmap gets at 31 nanoseconds vs 552 nanoseconds for the looping test. 31 nanoseconds seems a little too fast for me. Looking at Latency Numbers Every Programmer Should Know a main memory reference is around 100 nanoseconds. L2 cache reference is roughly 7 nanoseconds, but the HashMap with 10 million Long keys and values well exceeds L2. Also the JMH results look strange to me. 90% of the get calls take 0.0 nanoseconds?

I am assuming this is user error. Any help/pointers would be appreciated. Thanks.

UPDATE

Here are the results from doing AverageTime run. This is much more inline with my expectations. Thanks @oleg-estekhin! In the comments below I mentioned that I had done the AverageTime test previously and had similar results as SampleTime. I believe doing that run I had used a HashMap with far fewer entries and that the faster lookups did make sense.

Result: 266.306 ±(99.9%) 139.359 ns/op [Average]
  Statistics: (min, avg, max) = (27.266, 266.306, 1917.271), stdev = 410.904
  Confidence interval (99.9%): [126.947, 405.665]


# Run complete. Total time: 00:07:17

Benchmark                                Mode   Samples        Score  Score error    Units
c.t.b.MyBenchmark.testGetExistingKey     avgt       100      266.306      139.359    ns/op
Ditch answered 30/6, 2014 at 18:52 Comment(0)
B
11

First, looping test measures average time, while your JMH code is configured for sample time. From the Mode.SampleTime javadoc:

Sample time: samples the time for each operation.

Individual executions of Map.get() are pretty fast to the point when the underlying time measurement system will report 0 for some of the executions due to time measurement granularity (read Nanotrusting the Nanotime blog post by the JMH author for more information).

In the sample mode the benchmarks collects individual sample times into an array and then calculates averages and percentiles using that array. When more then half of the array values are zero (in your particular setup more than 90% of array values are zero, as indicated by the p(90.0000) = 0.000 ns/op) the average is bound to be pretty low but when you see p(50) = 0 (and especially p(90) = 0) in your output the only conclusion you can reliable make is that these results are garbage and you need to find another way to measure that code.

  • You should use Mode.AverageTime (or Mode.Throughput) benchmark mode. Leave Mode.SampleTime for situations when individual invocation takes substantial time.

  • You could add a "baseline" benchmark which executes the if () and key++ in order to isolate the time required for key bookkeeping and the actual Map.get() time, but you will need to explain the results (the blog post linked above describes pitfalls with subtracting "baselines" from "real" measurements).

  • You could try to use Blackhole.consumeCPU() to increase the execution time of individual invocation (see the previous point about "baselines" and associated pitfalls).

Bouncer answered 1/7, 2014 at 6:45 Comment(5)
Thanks for the response! I read the article you mentioned. Really well written. I am no going to have to learn Russian just so I can read more of his (Aleksey Shipilёv) work. From the article, talking about windows machines: "It seems like we only have 370 ns precision". Could this be a possible explanation as to to why it seems timings only show up on my machine in that are greater than 400ns? I actually started running the tests in AverageTime mode and had similar results. The SampleTime mode appeared to be providing more insight into what was going on. I will rerun the tests and reply.Ditch
On my work computer the nanoTime precision is even greater at about 450 ns. And of course this directly affects the possible timing values you can get.Bouncer
Should I be using the benchmarks batchSize feature to turn the granularity into noise? Is that a valid use case for that feature? In the jmh samples it is used for: "Sometimes you need to evaluate operation which doesn't have the steady state. The cost of a benchmarked operation may significantly vary from invocation to invocation."Ditch
Benchmarking is an art =) where you are trying all available options and then trying to interpret results in some meaningful way. What I mean is you can try adding batchSize and compare runs with and without it. If there will be no difference - good, your benchmark is stable with relation to that parameter, but if there will be difference you will probably want to find an explanation about why it is so and that will not be easy.Bouncer
@Oleg Estekhin SampleTime mode makes JMH back off to amortize nanoTime issues. You don't have to implement it yourself in this mode. That nanotime article above just shows the side effects. Average time is mostly helpless, you should treat it with care. However, I don't understand why you get zeroes up to p(90.0000).Trunkfish

© 2022 - 2024 — McMap. All rights reserved.