ConcurrentHashMap computeIfAbsent
Asked Answered
B

3

21

There is a new computeIfAbsent API introduced in Java 8. The javadocs for ConcurrentHashMap's impelementation of it state:

If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

So, what does it say about locking of this implementation in case when the the key already exists and the computation is unneeded? Is the whole method computeIfAbsent synchronized as stated in docs even if no calculation is needed or just the mapping function call is syncronized to prevent calling the function twice?

Bungalow answered 21/10, 2014 at 8:9 Comment(6)
The docs don't say it is synchronized, they say it is atomic. I would expect the cost to be similar to a get if the key already exists.Dacoit
@MarkoTopolnik You are right indeed.Dacoit
@Dacoit I deleted the comment because I'm not so sure if I'm right :) The presence check can be done before locking and if it passes, the method returns right away. But the updating function application must happen within the same synchronized block as the updating of the map. And the get operation must be repeated once the lock is acquired.Herzog
So this is the question which raised this question? Then we can safely say that the current implementation will only synchronize on the bucket during the computation and update but not during lookups.Bursitis
See bugs.openjdk.java.net/browse/JDK-8161372; it will be fixed in JDK 9.Predispose
@RolandIllig, thanksBungalow
E
28

The implementation of ConcurrentHashMap is quite complex, as it is specifically designed to allow concurrent readability while minimizing update contention. At a very high level of abstraction, it is organized as a bucketed hash table. All read operations do not require locking, and (quoting the javadoc) "there is not any support for locking the entire table in a way that prevents all access". To accomplish this, the internal design is highly sophisticated (but still elegant), with key-value mappings held in nodes which can be arranged in various ways (such as lists or balanced trees) in order to take advantage of fine grained locks. If you're interested in implementation details you can also have a look at the source code.

Trying to answer your questions:

So, what does it say about locking of this implementation in case when the the key already exists and the computation is unneeded?

It is reasonable to think that, as with any read operation, no locking is required to check if the key already exists and the mapping function does not need to be executed.

Is the whole method computeIfAbsent synchronized as stated in docs even if no calculation is needed or just the mapping function call is synchronized to prevent calling the function twice?

No, the method is not synchronized in terms of locking, but from the point of view of the caller it is executed atomically (i.e. the mapping function is applied at most once). If the key is not found, an update operation must be performed using the value computed by the mapping function and some kind of locking is involved while that function is invoked. It is reasonable to think that such locking is very fine-grained and only involves a very small portion of the table (well, the specific data structure where the key has to be stored) and this is why (quoting the javadoc, emphasis mine) "some attempted update operations by other threads may be blocked while computation is in progress".

Engineman answered 21/10, 2014 at 9:36 Comment(2)
bugs.openjdk.java.net/browse/JDK-8161372 this implies that in Java 8, computeIfAbsent is blocking even if the key exists. It does seem to be fixed in Java 9Modifier
As a clarification, you say "atomically (i.e. the mapping function is applied at most once)" but atomic actually means "all at once" or without interleaving. The 'computeIfAbsent(..)' function essentially has 3 steps: (1) see that the value is missing, (2) compute the Function, and (3) store the value in the map. Since this is atomic, there cannot be an interleaving of two threads where A does step 1, then B does 1, then both do steps 2 and then 3. Rather the first to do step 1 must also finish 2 and 3 before the other does step 1. Thus implying that the Function is only executed once!Sloan
F
9

You can get contention when the value already exists.

If you look at the source code for computeIfAbsent(), it's pretty complex but you see that the check whether the value is already present is inside the synchronized block. Consider this alternate version (which doesn't operate atomically):

/**
 * Alternate implementation that doesn't block when map already
 * contains the value
 */
public V computeIfAbsent2(K key, Function<? super K, ? extends V> mappingFunction) {
    V value = get(key);
    if (value == null) {
        value = mappingFunction.apply(key);
        put(key, value);
    }
    return value;
}

I ran a JMH test comparing this alternate implementation with the original. I ran 20 threads, and used a ConcurrentHashMap containing 20 values that already existed. Each thread would use all 20 values. The test exercised only the case that the value already exists. It ran on OS X. The result (after a 2-minute warmup) was

Benchmark                                     Mode  Cnt       Score   Error   Units
ComputIfAbsentTest.benchComputeAbsent        thrpt    2   77966.354          ops/ms
ComputIfAbsentTest.benchComputeAbsent2       thrpt    2  463096.033          ops/ms

I also tried running this with Flight Recording enabled, and the contention was clearly visible. Here's an example stack trace:

"local.ComputIfAbsentTest.benchComputeAbsent-jmh-worker-11" #25 daemon prio=5 os_prio=31 tid=0x00007f89da10b000 nid=0x7203 waiting for monitor entry [0x00007000021f8000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1674)
    - waiting to lock <0x0000000795f80540> (a java.util.concurrent.ConcurrentHashMap$Node)
    at local.ComputIfAbsentTest.benchComputeAbsent(ComputIfAbsentTest.java:87)
    at local.generated.ComputIfAbsentTest_benchComputeAbsent_jmhTest.benchComputeAbsent_thrpt_jmhStub(ComputIfAbsentTest_benchComputeAbsent_jmhTest.java:116)
    at local.generated.ComputIfAbsentTest_benchComputeAbsent_jmhTest.benchComputeAbsent_Throughput(ComputIfAbsentTest_benchComputeAbsent_jmhTest.java:76)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:430)
    at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:412)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Fishbowl answered 13/7, 2016 at 22:17 Comment(3)
How about the following strategy: call get; if value does not exist, call computeIfAbsent - IMHO this should give a single (atomic) call to the mappingFunction while performing efficiently most of the time.Ereshkigal
This is what Doug Lea had to say about it: concurrency-interest. Basically, he ran a benchmark showing that calling get before locking is not the fastest strategy for a Zipf distribution of keys.Abreu
I ended up using Remigius' suggestion to get the atomic behavior.Fishbowl
A
1

The bugfix @RolandIllig mentioned states that contention can still occur if the key is not the first in the bin. I tested this using JMH with Java 10.

Throughput of luckyKey:

Result: 324172.798 ±(99.9%) 15244.448 ops/ms [Average]

Throughput of unluckyKey:

Result: 15386.202 ±(99.9%) 526.877 ops/ms [Average]

Benchmark code

@Threads(8)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class ComputeIfAbsentBenchmark {

  @State(Scope.Benchmark)
  public static class MyState {
    private final Map<String, Integer> map = new ConcurrentHashMap<>();

    public MyState() {
      for (int i = 0; i < 100; i++)
        map.put(Integer.toString(i), i);
    }
  }

  @Benchmark
  public void luckyKey(final MyState state) {
    state.map.computeIfAbsent("1", key -> 100);
  }

  @Benchmark
  public void unluckyKey(final MyState state) {
    state.map.computeIfAbsent("98", key -> 100);
  }

}
Abreu answered 5/5, 2018 at 19:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.