JMH - weird benchmarking results
Asked Answered
S

2

6

I'm using JMH to benchmark DOM parser. I got really weird results as the first iteration actually run faster than later iterations

enter image description here

enter image description here

Can anyone explain why this happens? Also, what do percentiles and all the figures mean and why it starts getting stable after the third iteration? Does one iteration mean one iteration of the entire benchmarking method? Below is the method I'm running

@Benchmark 
@BenchmarkMode(Mode.SingleShotTime) 
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 13, time = 1, timeUnit = TimeUnit.MILLISECONDS)
public void testMethod_no_attr() {
    try {
        File fXmlFile = new File("500000-6.xml");
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(fXmlFile);
    } catch (Exception  e) {
        e.printStackTrace();  
    }
}
Soursop answered 22/2, 2018 at 20:56 Comment(1)
these are not iterations but percentiles - p(50.0000) - 2608.183 ms/op means that 50 percent of the traced operations were done in 2608.183 ms or earlier, p(99.9999) - 2978.871 ms/op means that 99.9999 percent of operations finished within 2978.871 ms (this includes the ones of all smaller percentiles, off course)Lebel
T
1

Your screenshot is missing histogram output. What you see is just a percentile of the histogram.

Check this example. We have 100 iterations and the result for method BoolArrayVsBitSetBenchmark#primitive is:

Result "de.jawb.jmh.benchmark.example.bool.BoolArrayVsBitSetBenchmark.primitive":
  N = 100
  mean =      0,493 ±(99.9%) 0,003 s/op

  Histogram, s/op:
    [0,490, 0,495) = 93
    [0,495, 0,500) = 3
    [0,500, 0,505) = 1
    [0,505, 0,510) = 1
    [0,510, 0,515) = 0
    [0,515, 0,520) = 0
    [0,520, 0,525) = 1
    [0,525, 0,530) = 0
    [0,530, 0,535) = 0
    [0,535, 0,540) = 0
    [0,540, 0,545) = 0
    [0,545, 0,550) = 0
    [0,550, 0,555) = 0
    [0,555, 0,560) = 1

  Percentiles, s/op:
      p(0,0000) =      0,490 s/op 
     p(50,0000) =      0,491 s/op
     p(90,0000) =      0,494 s/op
     p(95,0000) =      0,496 s/op
     p(99,0000) =      0,558 s/op
     p(99,9000) =      0,559 s/op
     p(99,9900) =      0,559 s/op
     p(99,9990) =      0,559 s/op
     p(99,9999) =      0,559 s/op
    p(100,0000) =      0,559 s/op

The first histogram line [0,490, 0,495) = 93 says that 93 calls belong to the group max 0,490s and min 0,494s. Note the bracket ')'

In summary, this means that no call was faster as 0,491 s/op, because of

p(0,0000) = 0,490 s/op

And max execution time is 0,559 s/op, because of

p(100,0000) = 0,559 s/op

Trish answered 22/2, 2018 at 21:30 Comment(4)
Thanks. Updated the histogram. Anyway, what does those values mean [0,005, 0,005) = 14?Soursop
I have updated my example. The execution takes now a bit slower. As you can see [0.005, 0,005) = 14 means. 14 calls took max 0,005 s.Trish
Thanks. What does the warm-up iteration mean? Why are they different?Soursop
@TriNguyen Warmup is required to trigger the JIT Optimization, loading / initializing cache, creating threads etc..Trish
P
2

These percentiles are horribly misleading. Don't use them.

If you have N = 10, it is no wonder all the bottom values are the same. Because you have way too few samples to talk about anything like a 99.999% quantile. The only value I would actually consider here is the median (50% quantile), and eventually the IQR (inter-quartile-ratio).

If you are optimistic, you'd assume your slowest sample is at 95%. If you are less optimistic, treat the slowest sample of N=10 as 90% quantile. And if you are more serious about these estimates, you may treat the spacing as being indicative of ca. 1/sqrt(N-1), i.e. assume that up to 33% of runs will be even slower than the slowest sample of N=10. You need many more samples to narrow down these estimates! Anything beyond 95% is just speculation. 99% - you cannot answer what is happening there. You just have not enough data.

As presented these values are just outright nonsense. Based on the data, you cannot estimate that P(99,9999)=something. This number corresponds to one out of a million runs being worse than this. But you only did 10, don't use this little N to predict what happens if you'd have a million. JMH should not print these extreme quantiles for small N.

Phallicism answered 22/2, 2018 at 22:50 Comment(0)
T
1

Your screenshot is missing histogram output. What you see is just a percentile of the histogram.

Check this example. We have 100 iterations and the result for method BoolArrayVsBitSetBenchmark#primitive is:

Result "de.jawb.jmh.benchmark.example.bool.BoolArrayVsBitSetBenchmark.primitive":
  N = 100
  mean =      0,493 ±(99.9%) 0,003 s/op

  Histogram, s/op:
    [0,490, 0,495) = 93
    [0,495, 0,500) = 3
    [0,500, 0,505) = 1
    [0,505, 0,510) = 1
    [0,510, 0,515) = 0
    [0,515, 0,520) = 0
    [0,520, 0,525) = 1
    [0,525, 0,530) = 0
    [0,530, 0,535) = 0
    [0,535, 0,540) = 0
    [0,540, 0,545) = 0
    [0,545, 0,550) = 0
    [0,550, 0,555) = 0
    [0,555, 0,560) = 1

  Percentiles, s/op:
      p(0,0000) =      0,490 s/op 
     p(50,0000) =      0,491 s/op
     p(90,0000) =      0,494 s/op
     p(95,0000) =      0,496 s/op
     p(99,0000) =      0,558 s/op
     p(99,9000) =      0,559 s/op
     p(99,9900) =      0,559 s/op
     p(99,9990) =      0,559 s/op
     p(99,9999) =      0,559 s/op
    p(100,0000) =      0,559 s/op

The first histogram line [0,490, 0,495) = 93 says that 93 calls belong to the group max 0,490s and min 0,494s. Note the bracket ')'

In summary, this means that no call was faster as 0,491 s/op, because of

p(0,0000) = 0,490 s/op

And max execution time is 0,559 s/op, because of

p(100,0000) = 0,559 s/op

Trish answered 22/2, 2018 at 21:30 Comment(4)
Thanks. Updated the histogram. Anyway, what does those values mean [0,005, 0,005) = 14?Soursop
I have updated my example. The execution takes now a bit slower. As you can see [0.005, 0,005) = 14 means. 14 calls took max 0,005 s.Trish
Thanks. What does the warm-up iteration mean? Why are they different?Soursop
@TriNguyen Warmup is required to trigger the JIT Optimization, loading / initializing cache, creating threads etc..Trish

© 2022 - 2024 — McMap. All rights reserved.