Hashmap concurrency issue
Asked Answered
C

9

35

I have a Hashmap that, for speed reasons, I would like to not require locking on. Will updating it and accessing it at the same time cause any issues, assuming I don't mind stale data?

My accesses are gets, not iterating through it, and deletes are part of the updates.

Cruciform answered 16/6, 2009 at 18:2 Comment(3)
Do your updates include deletes? Do your accesses include iterating though it?Endplay
Never mind, I guess those questions are irrelevant.Endplay
Or are they? [Need more sleep]Endplay
G
61

Yes, it will cause major problems. One example is what could happen when adding a value to the hash map: this can cause a rehash of the table, and if that occurs while another thread is iterating over a collision list (a hash table "bucket"), that thread could erroneously fail to find a key that exists in the map. HashMap is explicitly unsafe for concurrent use.

Use ConcurrentHashMap instead.

Garrett answered 16/6, 2009 at 18:5 Comment(0)
S
17

The importance of synchronising or using ConcurrentHashMap can not be understated.

I was under the misguided impression up until a couple of years ago that I could get away with only synchronising the put and remove operations on a HashMap. This is of course very dangerous and actually results in an infinite loop in HashMap.get() on some (early 1.5 I think) jdk's.

What I did a couple of years ago (and really shouldn't be done):

public MyCache {
    private Map<String,Object> map = new HashMap<String,Object>();

    public synchronzied put(String key, Object value){
        map.put(key,value);
    }

    public Object get(String key){
        // can cause in an infinite loop in some JDKs!!
        return map.get(key);
    }
}

EDIT: thought I'd add an example of what not to do (see above)

Stapleton answered 16/6, 2009 at 18:40 Comment(2)
I think the infinite loop is a feature of every Sun JDK. I can't remember which but a piece of well known open source software was using HashMap for logging, so left it unsynchronised for speed as complete accuracy was not required. In production, very occassionally it would get into an infinite loop - much worse than throwing an unchecked exception.Marge
I have seen this myself. If you attempt to update a HashMap in multiple threads with an I don't care about corruption approach it will hang you JVM. Unsynchronized/non-thread safe HashMap is only safe if you have not updates/deletes or only one thread accesses it.Reyesreykjavik
H
12

When in doubt, check the class's Javadocs:

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

Map m = Collections.synchronizedMap(new HashMap(...));

(emphasis not mine)

So based on the fact that you said that your threads will be deleting mappings from the Map, the answer is that yes it will definitely cause issue and yes it is definitely unsafe.

Haigh answered 16/6, 2009 at 18:33 Comment(0)
D
11

Yes. Very Bad Things will happen. For example, your thread might get stuck in an infinite loop.

Either use ConcurrentHashMap, or NonBlockingHashMap

Dorree answered 16/6, 2009 at 21:50 Comment(1)
I would not recommend to use NonBlockingHashMap. Just use ConcurrentHashMap.Azriel
O
8

The conditions you describe will not be satisfied by HashMap. Since the process of updating a map is not atomic you may encounter the map in an invalid state. Multiple writes might leave it in a corrupted state. ConcurrentHashMap (1.5 or later) does what you want.

Overpraise answered 16/6, 2009 at 18:9 Comment(0)
A
4

If by 'at the same time' you mean from multiple threads, then yes you need to lock access to it (Or use ConcurrentHashMap or similar that does the locking for you).

Artemas answered 16/6, 2009 at 18:11 Comment(0)
P
0

No, there will be no issues if you do the following:

  1. Place your data into the HashMap on the first load of a single thread before any multithreading occurs. This is because the process of adding data alters the modcount and is different on the first time you add it (a null will be returned) vs. replacing the data (the old data will be returned, but the modcount will not be altered). Modcount is what makes iterators fail-fast. If you're using get, though, nothing will be iterated on, so it's fine.

  2. Have the same keys throughout your application. Once the application starts and loads its data, no other keys can be assigned to this map. This way a get will either get stale data or data that was inserted fresh - there will be no issues.

Parrish answered 16/6, 2009 at 18:17 Comment(0)
M
0

Like others mentionned use a ConcurrentHashMap or synchronize the map when updating it.

Millenarian answered 16/6, 2009 at 22:27 Comment(0)
D
0

I read here or elsewhere, no, you don't access from multi thread, but noone says what's really happen.

So, I seen today (that's why I'm on this - old - question) on a application running in production since March : 2 put on the same HashSet (then HashMap) cause a CPU overload (near 100%), and memory increasing of 3GB, then down by GC. We have to restart the app.

Dogwood answered 7/11, 2014 at 17:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.