Synchronization in a HashMap cache
Asked Answered
M

4

7

I've got a web application where people ask for resources. This resources are cached using a synchronized hash map for efficiency. The problem here is when two different requests come for the same uncached resource at the same time: the operation retrieving the resources takes up a lot of memory, so I want to avoid calling it more than once for the same resource.

Can somebody please tell me if there is any potential problem with the following snippet? Thanks in advance.

private Map<String, Resource> resources = Collections.synchronizedMap(new HashMap<String, Resource>());

public void request(String name) {

  Resource resource = resources.get(name);

  if (resource == null) {
    synchronized(this) {
      if (resources.get(name) == null) {
        resource = veryCostlyOperation(name); // This should only be invoked once per resource...
        resources.put(resource);
      } else {
        resource = resources.get(name);
      }
    }
  }

  ...

}
Maritsa answered 30/3, 2011 at 14:57 Comment(0)
T
6

One possible problem is that you create unnecessary contention by executing veryCostlyOperation() inside a synchronized block, so that many threads cannot retrieve their (independent) resources at the same time. This can be solved by using Future<Resource> as values of the map:

Map<String, Future<Resource>> map = new ConcurrentHashMap<String, Future<Resource>>();    
...
Future<Resource> r = map.get(name);
if (r == null) {
    FutureTask task = null;
    synchronized (lock) {
        r = map.get(name);
        if (r == null) {
            task = new FutureTask(new Callable<Resource>() {
                public Resource call() {
                    return veryCostlyOperation(name);
                }
            });
            r = task;
            map.put(name, r);
        }
    }
    if (task != null) task.run(); // Retrieve the resource
}

return r.get(); // Wait while other thread is retrieving the resource if necessary
Taggart answered 30/3, 2011 at 15:24 Comment(2)
Thanks for using FutureTask, I've seen the class before in the internals, but never knew it has this characteristics.Amadeo
The problem with this is that you could be calling veryCostlyOperation on multiple resources simultaneously. The OP mentioned that he didn't want to call it on the same resource twice, but that could have been a curve-fit comment. If twelve difference resources are requested simultaneously in your code, twelve veryCostlyOperation calls will all be made in parallel. If they are indeed very memory intensive, you could run out of memory.Truthfunction
T
2

The only potential problem I see is that you synchronize to this. If any other code in the same class also synchronizes to this, only one of those blocks will run at once. Maybe there's nothing else that does this, and that's fine. I always worry about what the next programmer is going to do, though. (or myself in three months when I've forgotten about this code)

I would recommend creating a generic synch object and then synch'ing to that.

private final Object resourceCreationSynchObject = new Object();

then

synchronized(this.resourceCreationSynchObject) {
  ...
}

Otherwise, this does exactly what you're asking for. It ensures that veryCostlyOperation cannot be called in parallel.

Also, it's great thinking to re-get the resource a second time within the synchronized block. This is necessary, and the first call outside makes sure that you don't synchronize when the resource is already available. But there's no reason to call it a third time. First thing inside the synchronized block, set resource again to resources.get(name) and then check that variable for null. That will prevent you from having to call get again inside the else clause.

Truthfunction answered 30/3, 2011 at 15:6 Comment(2)
Why not use resources as synchronization object?Wormeaten
You could, it just seemed confusing to me to use an object that's already synchronized as the synchronization object. No reason you couldn't, though.Truthfunction
G
1

Your code looks ok, except that you are synchronizing more than actually required:

  • Using a ConcurrentHashMap instead of a synchronized HashMap would allow multiple invocations of the get method without locking.

  • Synchronizing on this instead of resources is probably not necessary, but it depends on the rest of your code.

Grant answered 30/3, 2011 at 15:7 Comment(2)
If he uses a ConcurrentHashMap, the get method is not synchronized, so that would not be an issue. The original code already prevents veryCostlyOperation to be invoked multiple times for the same name, and replacing the synchronized HashMap with a ConcurrentHashMap does not change that.Grant
Ok, if you keep the synchronized block around veryCostlyOperation, your right. But that was what I wanted to stress, because you did mention that syncing "is probably not necessary". It is, for this reason.Amadeo
C
0

Your code will potentially call veryCostlyOperation(name) multiple times. The problem is that there is an unsynchronized step after looking up the map:

public void request(String name) {
    Resource resource = resources.get(name);
    if (resource == null) {
        synchronized(this) {
            //...
        }
    }
    //...
}

The get() from the map is synchronized by the map, but checking the result for null is not protected by anything. If multiple threads enter this requesting the same "name", all of them will see a null result from resources.get(), until one actually finishes costlyOperation and puts the resource into the resources map.

A simpler and working, but less scalable approach would be to go with a normal map and make the entire request method synchronized. Unless it actually turns out a problem in practice I would choose the simple approach.

For higher scalability you can fix your code by checking the map again after synchronized(this) to catch the case outlined above. It would still not give the best scalability, since the synchronized(this) only allows one thread to execute costlyOperation, whereas in many practical cases, you only want to prevent multiple executions for the same resource while allowing for concurrent requests to different resources. In that case you need some facility to synchronize on the resource being requested. A very basic example:

private static class ResourceEntry {
     public Resource resource;
}

private Map<String, ResourceEntry> resources = new HashMap<String, ResourceEntry>();

public Resource request(String name) {
    ResourceEntry entry;
    synchronized (resources) {
        entry = resources.get(name);
        if (entry == null) {
            // if no entry exists, allocate one and add it to map
            entry = new ResourceEntry();
            resources.put(name, entry);
        }
    }
    // at this point we have a ResourceEntry, but it *may* be no loaded yet
    synchronized (entry) {
        Resource resource = entry.resource;
        if (resource == null) {
            // must create the resource
            resource = costlyOperation(name);
            entry.resource = resource;
        }
        return resource;
    }
}

This is only a rough sketch. Basically, it makes a sychronized lookup for a ResourceEntry, and then synchronizes on the ResourceEntry to ensure the specific resource is only built once.

Cartie answered 30/3, 2011 at 21:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.