ConcurrentHashMap: avoid extra object creation with "putIfAbsent"?
Asked Answered
S

7

43

I am aggregating multiple values for keys in a multi-threaded environment. The keys are not known in advance. I thought I would do something like this:

class Aggregator {
    protected ConcurrentHashMap<String, List<String>> entries =
                            new ConcurrentHashMap<String, List<String>>();
    public Aggregator() {}

    public void record(String key, String value) {
        List<String> newList =
                    Collections.synchronizedList(new ArrayList<String>());
        List<String> existingList = entries.putIfAbsent(key, newList);
        List<String> values = existingList == null ? newList : existingList;
        values.add(value);
    }
}

The problem I see is that every time this method runs, I need to create a new instance of an ArrayList, which I then throw away (in most cases). This seems like unjustified abuse of the garbage collector. Is there a better, thread-safe way of initializing this kind of a structure without having to synchronize the record method? I am somewhat surprised by the decision to have the putIfAbsent method not return the newly-created element, and by the lack of a way to defer instantiation unless it is called for (so to speak).

Stickup answered 24/5, 2012 at 18:54 Comment(3)
Don't worry about extra objects unless there is a benchmark. Short-lived object allocation/GC is cheap cheap cheap in modern JVMs. (I guess "some" is more than "none", but the modern JVM has no problem with this in general.) In any case, still an interesting question insofar as it will hopefully yield some interesting approaches. "Deferring" allocation and such is a bit awkward in Java, and thus not common, due to lack of closures or "pass by name" semantics. (Anonymous classes aren't that sexy and would generally need to have a corresponding interface as well as an overload for putIf..).Bulla
More than two years later and still a great question. I really wish Java 1.8 had added something like putIfAbsent (K key, Supplier<V> value) for lazy instantiation of the default object. It certainly has retrofitted other support for the streams API onto the ConcurrentMap interface.Dagostino
@Dagostino Java 1.8 added a computeIfAbsent(K key, Function<K, V> mappingFunction) method to Map and ConcurrentMap that should do what you want for lazy instantiation of a default object.Dorman
F
49

Java 8 introduced an API to cater for this exact problem, making a 1-line solution:

public void record(String key, String value) {
    entries.computeIfAbsent(key, k -> Collections.synchronizedList(new ArrayList<String>())).add(value);
}

For Java 7:

public void record(String key, String value) {
    List<String> values = entries.get(key);
    if (values == null) {
        entries.putIfAbsent(key, Collections.synchronizedList(new ArrayList<String>()));
        // At this point, there will definitely be a list for the key.
        // We don't know or care which thread's new object is in there, so:
        values = entries.get(key);
    }
    values.add(value);
}

This is the standard code pattern when populating a ConcurrentHashMap.

The special method putIfAbsent(K, V)) will either put your value object in, or if another thread got before you, then it will ignore your value object. Either way, after the call to putIfAbsent(K, V)), get(key) is guaranteed to be consistent between threads and therefore the above code is threadsafe.

The only wasted overhead is if some other thread adds a new entry at the same time for the same key: You may end up throwing away the newly created value, but that only happens if there is not already an entry and there's a race that your thread loses, which would typically be rare.

Feints answered 24/5, 2012 at 19:0 Comment(15)
I miss having a Java equivalent of Python's defaultdict... does anyone else?Muzhik
@PlatinumAzure See docs.guava-libraries.googlecode.com/git/javadoc/com/google/…Chaste
So the putIfAbsent call replaces the values variable with a null; this code needs a test to see if putIfAbsent returned null.Stickup
@GeneGolovchinsky You're right! I've fixed the code. Thanks for noticing that (I really should have checked...)Feints
@Erlend that possibility isn't part of this scenario.Feints
@Feints What do you mean it's not part of the scenario? Do you mean that you must guarantee that this scenario never happens because if it did, your code would not safely handle it?Mcmillian
@didibus I mean if all code goes uses this method, there won't be a problem. Of course if another thread deletes the entry in a race condition there will be a problem.Feints
I know that this is an old answer, but @Erlend is correct. It's possible that another thread removes the entry between the call to putIfAbsent and get. The answer by Gene is the correct implementation that guarantees you will not throw an NPE on the call to values.add().Chancemedley
@SeanBright You are right, the GeneGolovchinsky answer is better, because in this answer it is unneccessery to do additional lookup in the map, which could be replaced by faster operation: just using if to check if we get null or not.Agenda
What if what I need is a combination of both? i.e. putIfAbsent semantics in terms of the return value, and a lambda to supply the value (to avoid superfluous object creation)? This would be useful if we need to know if a new entry was created or not.Shuttle
@DanielNitzan there is no "both"; the lambda of computeIfAbsent() is only called if there is no entry, so there is only the key. Can you elaborate on what you want to do and/or the context?Feints
@Feints i need to know when an entry is created by computeIfAbsent, such that a certain computation is performed only if the key already existed.Shuttle
@DanielNitzan Sounds like you might want to try computeIfPresent()Feints
@DanielNitzan Firstly, please state your requirement in plain English (I'm not clear on what you want). Secondly, I recommend asking a question (same advice about using English, but add "given, when, then" examples),Feints
@Feints I've posted a question here #62673309Shuttle
D
15

As of Java-8 you can create Multi Maps using the following pattern:

public void record(String key, String value) { entries.computeIfAbsent(key, k -> Collections.synchronizedList(new ArrayList<String>())) .add(value); }

The ConcurrentHashMap documentation (not the general contract) specifies that the ArrayList will only be created once for each key, at the slight initial cost of delaying updates while the ArrayList is being created for a new key:

http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#computeIfAbsent-K-java.util.function.Function-

Dorman answered 1/9, 2014 at 5:31 Comment(4)
I ended up doing this so often I wrote a module for it: github.com/ansell/JDefaultDictDorman
Also if you can get away with a Collection rather than a List it's good to use ConcurrentLinkedQueue for a truly concurrent solution.Meli
Yes, but it is way slower than the putIfAbsent approach presented in other answers here. Depending on setup it is slower from 2 with moderate contention to 50 times with high contention.Agenda
I've done benchmarks: #44970043Agenda
S
11

In the end, I implemented a slight modification of @Bohemian's answer. His proposed solution overwrites the values variable with the putIfAbsent call, which creates the same problem I had before. The code that seems to work looks like this:

    public void record(String key, String value) {
        List<String> values = entries.get(key);
        if (values == null) {
            values = Collections.synchronizedList(new ArrayList<String>());
            List<String> values2 = entries.putIfAbsent(key, values);
            if (values2 != null)
                values = values2;
        }
        values.add(value);
    }

It's not as elegant as I'd like, but it's better than the original that creates a new ArrayList instance at every call.

Stickup answered 24/5, 2012 at 21:28 Comment(6)
As of Java-8 you can replace this with: entries.computeIfAbsent(key, k -> Collections.synchronizedList(new ArrayList<String>())).add(value)Dorman
@Dorman Please post your comment as an answer because with lambadas is the most elegant and clear way.Optometrist
The original answer only creates an ArrayList instance if the key is not already there so that doesn't seem too bad to me.Sacramental
@Dorman This is the approach that has the fastest execution time, it is from 2 to 50 times faster than the "lambda" approach in evironment with high contention.Agenda
@KrzysztofCichocki I would be interested to see JMH or similar benchmarking on the two to get clearer stats, given I reuse the computeIfAbsent pattern very oftenDorman
@Dorman benchmarks: #44970043Agenda
C
3

Created two versions based on Gene's answer

public  static <K,V> void putIfAbsetMultiValue(ConcurrentHashMap<K,List<V>> entries, K key, V value) {
    List<V> values = entries.get(key);
    if (values == null) {
        values = Collections.synchronizedList(new ArrayList<V>());
        List<V> values2 = entries.putIfAbsent(key, values);
        if (values2 != null)
            values = values2;
    }
    values.add(value);
}

public  static <K,V> void putIfAbsetMultiValueSet(ConcurrentMap<K,Set<V>> entries, K key, V value) {
    Set<V> values = entries.get(key);
    if (values == null) {
        values = Collections.synchronizedSet(new HashSet<V>());
        Set<V> values2 = entries.putIfAbsent(key, values);
        if (values2 != null)
            values = values2;
    }
    values.add(value);
}

It works well

Cooee answered 5/7, 2013 at 18:21 Comment(0)
S
3

This is a problem I also looked for an answer. The method putIfAbsent does not actually solve the extra object creation problem, it just makes sure that one of those objects doesn't replace another. But the race conditions among threads can cause multiple object instantiation. I could find 3 solutions for this problem (And I would follow this order of preference):

1- If you are on Java 8, the best way to achieve this is probably the new computeIfAbsent method of ConcurrentMap. You just need to give it a computation function which will be executed synchronously (at least for the ConcurrentHashMap implementation). Example:

private final ConcurrentMap<String, List<String>> entries =
        new ConcurrentHashMap<String, List<String>>();

public void method1(String key, String value) {
    entries.computeIfAbsent(key, s -> new ArrayList<String>())
            .add(value);
}

This is from the javadoc of ConcurrentHashMap.computeIfAbsent:

If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

2- If you cannot use Java 8, you can use Guava's LoadingCache, which is thread-safe. You define a load function to it (just like the compute function above), and you can be sure that it'll be called synchronously. Example:

private final LoadingCache<String, List<String>> entries = CacheBuilder.newBuilder()
        .build(new CacheLoader<String, List<String>>() {
            @Override
            public List<String> load(String s) throws Exception {
                return new ArrayList<String>();
            }
        });

public void method2(String key, String value) {
    entries.getUnchecked(key).add(value);
}

3- If you cannot use Guava either, you can always synchronise manually and do a double-checked locking. Example:

private final ConcurrentMap<String, List<String>> entries =
        new ConcurrentHashMap<String, List<String>>();

public void method3(String key, String value) {
    List<String> existing = entries.get(key);
    if (existing != null) {
        existing.add(value);
    } else {
        synchronized (entries) {
            List<String> existingSynchronized = entries.get(key);
            if (existingSynchronized != null) {
                existingSynchronized.add(value);
            } else {
                List<String> newList = new ArrayList<>();
                newList.add(value);
                entries.put(key, newList);
            }
        }
    }
}

I made an example implementation of all those 3 methods and additionally, the non-synchronized method, which causes extra object creation: http://pastebin.com/qZ4DUjTr

Spelaean answered 22/7, 2016 at 14:58 Comment(0)
B
1

Waste of memory (also GC etc.) that Empty Array list creation problem is handled with Java 1.7.40. Don't worry about creating empty arraylist. Reference : http://javarevisited.blogspot.com.tr/2014/07/java-optimization-empty-arraylist-and-Hashmap-cost-less-memory-jdk-17040-update.html

Ballou answered 1/9, 2014 at 5:42 Comment(0)
A
1

The approach with putIfAbsent has the fastest execution time, it is from 2 to 50 times faster than the "lambda" approach in evironments with high contention. The Lambda isn't the reason behind this "powerloss", the issue is the compulsory synchronisation inside of computeIfAbsent prior to the Java-9 optimisations.

the benchmark:

import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;

public class ConcurrentHashMapTest {
    private final static int numberOfRuns = 1000000;
    private final static int numberOfThreads = Runtime.getRuntime().availableProcessors();
    private final static int keysSize = 10;
    private final static String[] strings = new String[keysSize];
    static {
        for (int n = 0; n < keysSize; n++) {
            strings[n] = "" + (char) ('A' + n);
        }
    }

    public static void main(String[] args) throws InterruptedException {
        for (int n = 0; n < 20; n++) {
            testPutIfAbsent();
            testComputeIfAbsentLamda();
        }
    }

    private static void testPutIfAbsent() throws InterruptedException {
        final AtomicLong totalTime = new AtomicLong();
        final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
        final Random random = new Random();
        ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);

        for (int i = 0; i < numberOfThreads; i++) {
            executorService.execute(new Runnable() {
                @Override
                public void run() {
                    long start, end;
                    for (int n = 0; n < numberOfRuns; n++) {
                        String s = strings[random.nextInt(strings.length)];
                        start = System.nanoTime();

                        AtomicInteger count = map.get(s);
                        if (count == null) {
                            count = new AtomicInteger(0);
                            AtomicInteger prevCount = map.putIfAbsent(s, count);
                            if (prevCount != null) {
                                count = prevCount;
                            }
                        }
                        count.incrementAndGet();
                        end = System.nanoTime();
                        totalTime.addAndGet(end - start);
                    }
                }
            });
        }
        executorService.shutdown();
        executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
        System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
                + " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
    }

    private static void testComputeIfAbsentLamda() throws InterruptedException {
        final AtomicLong totalTime = new AtomicLong();
        final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
        final Random random = new Random();
        ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);
        for (int i = 0; i < numberOfThreads; i++) {
            executorService.execute(new Runnable() {
                @Override
                public void run() {
                    long start, end;
                    for (int n = 0; n < numberOfRuns; n++) {
                        String s = strings[random.nextInt(strings.length)];
                        start = System.nanoTime();

                        AtomicInteger count = map.computeIfAbsent(s, (k) -> new AtomicInteger(0));
                        count.incrementAndGet();

                        end = System.nanoTime();
                        totalTime.addAndGet(end - start);
                    }
                }
            });
        }
        executorService.shutdown();
        executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
        System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
                + " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
    }

}

The results:

Test testPutIfAbsent average time per run: 115.756501 ns
Test testComputeIfAbsentLamda average time per run: 276.9667055 ns
Test testPutIfAbsent average time per run: 134.2332435 ns
Test testComputeIfAbsentLamda average time per run: 223.222063625 ns
Test testPutIfAbsent average time per run: 119.968893625 ns
Test testComputeIfAbsentLamda average time per run: 216.707419875 ns
Test testPutIfAbsent average time per run: 116.173902375 ns
Test testComputeIfAbsentLamda average time per run: 215.632467375 ns
Test testPutIfAbsent average time per run: 112.21422775 ns
Test testComputeIfAbsentLamda average time per run: 210.29563725 ns
Test testPutIfAbsent average time per run: 120.50643475 ns
Test testComputeIfAbsentLamda average time per run: 200.79536475 ns
Agenda answered 7/7, 2017 at 11:5 Comment(1)
Just to clarify this great answer, the lambda itself isn't the performance issue, the issue is the compulsory synchronisation inside of computeIfAbsent prior to the Java-9 optimisations: bugs.java.com/bugdatabase/view_bug.do?bug_id=8161372Dorman

© 2022 - 2024 — McMap. All rights reserved.