Lazily initialize a Java map in a thread safe manner
Asked Answered
B

3

16

I need to lazily initialize a map and its contents. I have the below code till now:

class SomeClass {
    private Map<String, String> someMap = null;

    public String getValue(String key) {
        if (someMap == null) {
            synchronized(someMap) {
                someMap = new HashMap<String, String>();
                // initialize the map contents by loading some data from the database.
                // possible for the map to be empty after this.
            }
        }
        return someMap.get(key);  // the key might not exist even after initialization
    }
}

This is obviously not thread-safe as if one thread comes when someMap is null, goes on to initialize the field to new HashMap and while its still loading the data in the map, another thread does a getValue and doesn't get the data when one might have existed.

How can I make sure that the data is loaded in the map only once when the first getValue call happens.

Please note that it's possible that the the key won't exist in the map after all the initialization. Also, it's possible that the map is simply empty after all the initialization.

Bootlace answered 22/5, 2014 at 13:1 Comment(3)
moreover you will receive NullPointerException on the line synchronized(someMap) since someMap is nullDiscriminant
In this thread, people who don't realize how difficult concurrent is.Carmelcarmela
@JohnVint pun intended?Lothar
R
27

Double Check Locking

Double check locking requires several steps all to be completed in order to work properly, you are missing two of them.

First you will need to make someMap into a volatile variable. This is so that other threads will see changes made to it when they are made but after the changes are complete.

private volatile Map<String, String> someMap = null;

You also need a second check for null inside the synchronized block to make sure that another thread hasn't initialized it for you while you were waiting to enter the synchronized area.

    if (someMap == null) {
        synchronized(this) {
            if (someMap == null) {

Do not assign until ready for use

In your generation of the map construct it in a temp variable then assign it at the end.

                Map<String, String> tmpMap = new HashMap<String, String>();
                // initialize the map contents by loading some data from the database.
                // possible for the map to be empty after this.
                someMap = tmpMap;
            }
        }
    }
    return someMap.get(key); 

To explain why the temporary map is required. As soon as you complete the line someMap = new HashMap... then someMap is no longer null. That means other calls to get will see it and never try to enter the synchronized block. They will then try to get from the map without waiting for the database calls to complete.

By making sure the assignment to someMap is the last step within the synchronized block that prevents this from happening.

unmodifiableMap

As discussed in the comments, for safety it would also be best to save the results in an unmodifiableMap as future modifications would not be thread safe. This is not strictly required for a private variable that is never exposed, but it's still safer for the future as it stops people coming in later and changing the code without realizing.

            someMap = Collections.unmodifiableMap(tmpMap);

Why not use ConcurrentMap?

ConcurrentMap makes individual actions (i.e. putIfAbsent) thread-safe but it does not meet the fundamental requirement here of waiting until the map is fully populated with data before allowing reads from it.

Additionally in this case the Map after the lazy initialization is not being modified again. The ConcurrentMap would add synchronization overhead to operations that in this specific use case do not need to be synchronized.

Why synchronize on this?

There is no reason. :) It was just the simplest way to present a valid answer to this question.

It would certainly be better practice to synchronize on a private internal object. You have improved encapsulation traded off for marginally increased memory usage and object creation times. The main risk with synchronizing on this is that it allows other programmers to access your lock object and potentially try synchronizing on it themselves. This then causes un-needed contention between their updates and yours, so an internal lock object is safer.

In reality though a separate lock object is overkill in many cases. It's a judgement call based on the complexity of your class and how widely is is used against the simplicity of just locking on this. If in doubt you should probably use an internal lock object and take the safest route.

In the class:

private final Object lock = new Object();

In the method:

synchronized(lock) {

As for java.util.concurrent.locks objects, they don't add anything useful in this case (although in other cases they are very useful). We always want to wait until the data is available so the standard synchronized block gives us exactly the behavior we need.

Roybn answered 22/5, 2014 at 13:6 Comment(5)
That's the most common idiom. Recommended. I would use Guava's immutable map if the data is not subject to change.Seppuku
@GiovanniBotta If guava is not available Collections.unmodifiableMap is the most common I have seen.Carmelcarmela
@JohnVint Actually in this case if the map is private and the values never change it doesn't matter.Seppuku
Huh? I am saying, if Guava is not available on the classpath, the best alternative is Collections.unmodifiableMap. It serves the same purpose as the ImmutableMap in terms of not allowing the contents to change.Carmelcarmela
I would hasten to add that the double check only works because the second one happens inside the synchronized block.Ultimogeniture
G
2

I think TimB explained most of the options very well, but I think the quickest and most obvious answer would be to create it when the class instance is instantiated.

class SomeClass {
    private final Map<String, String> someMap = new HashMap<String, String>();

    public String getValue(String key) {
        return someMap.get(key);  // the key might not exist even after initialization
    }
}
Granicus answered 22/5, 2014 at 15:34 Comment(0)
R
1

The reason you want to lazy initialize your map is because the generation of values is resource intensive. Generally you can differentiate between two usecases

  1. Generation/storage of each value is equally expensive
  2. Generation of values is expensive but if you generate one, generating the rest is not that expensive anymore (for example you need to query a database)

The Guava library has a solution for both. Use a Cache to generate values on the fly or a CacheLoader + loadAll to bulk generate values. Since the initialization of an empty Cache is virtually free there is no need to use the double check idiom: simply assign the Cache instance to a private final field.

Rowlandson answered 22/5, 2014 at 13:4 Comment(8)
As I understand a cache from the first glance, this helps with adding items to the cache, but not creating the cache lazily in the first place.Reichard
@Silly Freak The cache is created lazy as you invoke getRowlandson
I'm talking about Cache c = null; c.get(key); throwing a NPE. The OP was asking about safely creating the Cache/Map, not inserting into it.Reichard
He asked about lazy initialization of the map. Since creating an empty cache/map is essentially free there is no reason to not always initialize the field. This way c.get(key) will never throw a NPE but the contents will be initialized lazy.Rowlandson
It may well be that it's cheap, but you don't answer the question the OP asked. The OP checks whether the data is loaded by checking for null, and the initialized map may well be empty. By creating an empty map in advance, pre- and post-initialization can't be distinguished. Moreover, the OP is talking about atomically initializing the whole map, not loading individual entries on demand. Likewise, he/she doesn't seem to need additional features like capacity or expiry of entries.Reichard
I actually like the idea but it might not fit the use case. If the data for the given key can be lazily loaded, then a cache is a great fit. However, if the whole map must be lazily loaded, then I am not sure this will work in the same easy way as other solutions.Seppuku
@GiovanniBotta you are right and that's what CacheLoader.loadAll is for. Since in most usecases generating values scales linearly i restrictet my answer to the common case :)Rowlandson
@Rowlandson Your answer would benefit from some of this comment information being included in it. Currently the answer itself is a little terse.Deceptive

© 2022 - 2024 — McMap. All rights reserved.