Java Increment benchmark [closed]

I do investigation about best performance of multithreading increment. I checked implementation based on synchronize, AtomicInteger and custom implementation like in AtomicInteger, but with parkNanos(1), on failed CAS.

private int customAtomic() {
        int ret;
        for (;;) {
            ret = intValue;
            if (unsafe.compareAndSwapInt(this, offsetIntValue, ret, ++ret)) {
                break;
            }
            LockSupport.parkNanos(1);
        }
        return ret;
    }

I made benchmark based on JMH: clear execution each method, each method with consume CPU (1,2,4,8,16 times) and only consume CPU. Each benchmark method performed on Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz, 8 Core + 8 HT 64Gb RAM, on 1-17 threads. The results surprised me:

CAS most effective in 1 thread. 2 thread - similar result with monitor. 3 and more - worse than monitor, ~ 2 times.
The custom implementation in most cases is in 2-3 times better, than monitor.
But in the custom implementation, randomly sometimes happens bad execution. Good case - 50 op/microsec., bad case - 0.5 op/microsec.

Questions:

Why AtomicInteger is not based on synchronize, it is more productive, then current impl?
Why AtomicInteger does not use LockSupport.parkNanos(1), on CAS fail?
Why does happens this spikes in custom implementation?

I tried to perform this test few times, and spike always happens in different number threads. Also I tried this test in another machines, the result is the same. Maybe it's a problems in test. In "bad case" of custom impl in StackProfiler, I see:

....[Thread state distributions]....................................................................
 50.0%         RUNNABLE
 49.9%         TIMED_WAITING

....[Thread state: RUNNABLE]........................................................................
 43.3%  86.6% sun.misc.Unsafe.park
  5.8%  11.6% com.jad.generated.IncrementBench_incrementCustomAtomicWithWork_jmhTest.incrementCustomAtomicWithWork_thrpt_jmhStub
  0.8%   1.7% org.openjdk.jmh.infra.Blackhole.consumeCPU
  0.1%   0.1% com.jad.IncrementBench$Worker.work
  0.0%   0.0% java.lang.Thread.currentThread
  0.0%   0.0% com.jad.generated.IncrementBench_incrementCustomAtomicWithWork_jmhTest._jmh_tryInit_f_benchmarkparams1_0
  0.0%   0.0% org.openjdk.jmh.infra.generated.BenchmarkParams_jmhType_B1.<init>

....[Thread state: TIMED_WAITING]...................................................................
 49.9% 100.0% sun.misc.Unsafe.park

In "good case":

....[Thread state distributions]....................................................................
 88.2%         TIMED_WAITING
 11.8%         RUNNABLE

....[Thread state: TIMED_WAITING]...................................................................
 88.2% 100.0% sun.misc.Unsafe.park

....[Thread state: RUNNABLE]........................................................................
  5.6%  47.9% sun.misc.Unsafe.park
  3.1%  26.3% org.openjdk.jmh.infra.Blackhole.consumeCPU
  2.4%  20.3% com.jad.generated.IncrementBench_incrementCustomAtomicWithWork_jmhTest.incrementCustomAtomicWithWork_thrpt_jmhStub
  0.6%   5.5% com.jad.IncrementBench$Worker.work
  0.0%   0.0% com.jad.generated.IncrementBench_incrementCustomAtomicWithWork_jmhTest.incrementCustomAtomicWithWork_Throughput
  0.0%   0.0% java.lang.Thread.currentThread
  0.0%   0.0% org.openjdk.jmh.infra.generated.BenchmarkParams_jmhType_B1.<init>
  0.0%   0.0% sun.misc.Unsafe.putObject
  0.0%   0.0% org.openjdk.jmh.runner.InfraControlL2.announceWarmdownReady
  0.0%   0.0% sun.misc.Unsafe.compareAndSwapInt

Link to benchmark code

Link to result graphs. X - threads count, Y - thpt, op/microsec

Link to RAW log

UPD

Okay, I know, I understand that, when I use parkNanos, one thread also can hold the lock (CAS) for long periods of time. Threads, with CAS-fail, goes sleep, and only one thread doing work and incrementing value. I see, that for big concurrency level, when work is so small - AtomicInteger is not better approach. But if we increase workSize, for example to level = CASThrpt/threadNum, it should works fine: For local machine I have set workSize=300, result of my test:

Benchmark                                     (workSize)   Mode  Cnt  Score   Error   Units
IncrementBench.incrementAtomicWithWork               300  thrpt    3  4.133 ± 0.516  ops/us
IncrementBench.incrementCustomAtomicWithWork         300  thrpt    3  1.883 ± 0.234  ops/us
IncrementBench.lockIntWithWork                       300  thrpt    3  3.831 ± 0.501  ops/us
IncrementBench.onlyWithWork                          300  thrpt    3  4.339 ± 0.243  ops/us

AtomicInteger - win, lock - second place, custom - third. But problem with spikes, still not clear. And I forgot about java version: Java(TM) SE Runtime Environment (build 1.7.0_79-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)

In the case of synchronized, it tends to be sticky with locks, which means one thread can hold the lock for long periods of time and not let another thread grab it fairly. This is very bad for multi-threading but excellent if you have a benchmark which would perform better if only one thread is running for relatively long periods of time.

You need to change the test so it would run better when using multiple threads than using just one thread or you will in fact be testing which locking strategy has the poorest fairness policies.

The locking strategy attempts to adjust how the locking is performed which is why it can change the behaviour, but it can't do a good job as the code should never have been multi-threaded in the first place.

Recommended topics

Hot tags