In Java, is it safe to change a reference to a HashMap read concurrently
Asked Answered
C

4

17

I hope this isn't too silly a question...

I have code similar to the following in my project:

public class ConfigStore {

    public static class Config {

        public final String setting1;
        public final String setting2;
        public final String setting3;

        public Config(String setting1, String setting2, String setting3) {
            this.setting1 = setting1;
            this.setting2 = setting2;
            this.setting3 = setting3;
        }

    }

    private volatile HashMap<String, Config> store = new HashMap<String, Config>();

    public void swapConfigs(HashMap<String, Config> newConfigs) {
        this.store = newConfigs;
    }

    public Config getConfig(String name) {
        return this.store.get(name);
    }

}

As requests are processed, each thread will request a config to use from the store using the getConfig() function. However, periodically (every few days most likely), the configs are updated and swapped out using the swapConfigs() function. The code that calls swapConfigs() does not keep a reference to the Map it passes in as it is simply the result of parsing a configuration file.

  • In this case, is the volatile keyword still needed on the store instance variable?
  • Will the volatile keyword introduce any potential performance bottlenecks that I should be aware of or can avoid given that the rate of reads greatly exceeds the rate of writes?

Thanks very much,

Crowboot answered 25/8, 2011 at 7:51 Comment(0)
B
12

Since changing references is an atomic operation, you won't end up with one thread modifying the reference, and the other seeing a garbage reference, even if you drop volatile. However, the new map may not get instantly visible for some threads, which may consequently keep reading configuration from the old map for an indefinite time (or forever). So keep volatile.

Update

As @BeeOnRope pointed out in a comment below, there is an even stronger reason to use volatile:

"non-volatile writes [...] don't establish a happens-before relationship between the write and subsequent reads that see the written value. This means that a thread can see a new map published through the instance variable, but this new map hasn't been fully constructed yet. This is not intuitive, but it's a consequence of the memory model, and it happens in the real word. For an object to be safely published, it must be written to a volatile, or use a handful of other techniques.

Since you change the value very rarely, I don't think volatile would cause any noticeable performance difference. But at any rate, correct behaviour trumps performance.

Bindweed answered 25/8, 2011 at 7:58 Comment(13)
Peter just beat Tomasz here, but both answer are basically to be the same and correct. Thanks very much. :)Crowboot
Is changing of 64 bit reference really guaranteed to be atomic?Jillion
@hgrey, yes, it is explicitly guaranteed by the JLS (ch. 17.7.): "Writes to and reads of references are always atomic, regardless of whether they are implemented as 32-bit or 64-bit values."Cable
This is incorrect. You need the volatile. Although non-volatile writes are atomic, they don't establish a happens before relationship between the write and subsequent reads that see the written value. This means that a thread can see a new map published through the instance variable, but this new map hasn't been fully constructed yet. This is not intuitive, but it's a consequence of the memory model, and it happens in the real word. For an object to be safely published, it must be written to a volatile, or use a handful of other techniques.Tapestry
@BeeOnRope, my consequence above is also "keep volatile", so you are arguing for exactly the same cause as myself. I hope you remove the undeserved downvote.Cable
Your post is wrong regardless because you can see a garbage map without volatile. You correctly mention the staleness issue, but miss the more important correctness/safety issue. Even a program that could accept stale values indefinitely would be unsafe without volatile.Tapestry
@BeeOnRope, you are perfectly right - and welcome - to point out an important aspect missed by other answers. However, 1) it does not affect the conclusion of the answer(s), and 2) it was not the specific subject of the OP. I don't care about losing a few rep, but I do strongly care about fairness and justice - not just for myself, but for anyone using this forum. Please think more carefully before you state an answer to be "incorrect" in the future.Cable
@PéterTörök "you won't end up with one thread modifying the reference, and the other seeing a garbage reference, even if you drop volatile." is misleading. You will always see a valid reference, but it could point to an object that is in an inconsistent state because it has not been safely published.Charentemaritime
@assylias, you are restating what BeeOnRope already explained. I got it by now, thanks :-)Cable
@PéterTörök - I don't think you need to invoke fairness and justice here. You seem to believe than an answer which happens to be correct in its conclusion, but incorrect in its reasoning, should be allowed to stand. I don't agree. If it is obvious that the answer only incidentally arrives at the correct inclusion, but via an incorrect reasoning, I don't think it is wrong to augment it with another answer. Furthermore, if you agree that your answer is incomplete, I don't understand why you don't fix it.Tapestry
For what it's worth, you are not alone - I've seem the exact same misunderstanding of thread safety post safe-publishing, versus initial safe-publishing in many code reviews. I've also seen it blow up in production two times, total (that I'm aware of), against many more similar issues in the code, caught later or not at all - so if your argument is that it won't matter most (> 50%) of the time, you are correct based on my experience.Tapestry
@BeeOnRope, there is a fine but definite line between incomplete and incorrect. To give you an analogy, if your grandma asks you a computer-related question, your answer will most likely be correct, but not complete :-) Anyway, you are right in that I shall fix my answer. And even though I disagree with your usage of specific terms, nevertheless I thank you for pointing out the gap in my knowledge.Cable
Downvote removed. If there is an dispute on the terms, I'm happy to keep up the dispute.Tapestry
T
12

No, this is not thread safe without volatile, even apart from the issues of seeing stale values. Even though there are no writes to the map itself, and reference assignment is atomic, the new Map<> has not been safely published.

For an object to be safely published, it must be communicated to other threads using some mechanism that either establishes a happens-before relationship between the object construction, the reference publication and the reference read, or it must use a handful of narrower methods which are guaranteed to be safe for publishing:

  • Initializing an object reference from a static initializer.
  • Storing a reference to it into a final field.

Neither of those two publication specific ways applies to you, so you'll need volatile to establish happens-before.

Here is a longer version of this reasoning, including links to the JLS and some examples of real-world things that can happen if you don't publish safely.

More details on safe publication can be found in JCIP (highly recommended), or here.

Tapestry answered 20/6, 2013 at 18:23 Comment(2)
Well you can always check out this question - I ended up here as it was linked there. The same misunderstanding about safe publishing is going on there.Tapestry
@jtahlborn - this question just seems to keep coming up, and it turns out there was an 8-year-old question on the same topic with no decent answer. I added a longer answer than this one which should hopefully settle it.Tapestry
F
9

Your code is fine. You need volatile, otherwise your code would be 100% thread-safe (updating a reference is atomic), however the change might not be visible to all the threads. It means some threads will still see the old value of store.

That being said volatile is obligatory in your example. You might consider AtomicReference, but it won't give you anything more in your case.

You cannot trade correctness for performance so your second question is not really valid. It will have some performance impact, but probably only during update, which happens very rarely as you said. Basically JVM will ensure the change is visible to all the threads by "flushing" it, but after that it will be accessible as any other local variable (up until next update).

BTW I like Config class being immutable, please also consider immutable Map implementation just in case.

Flowers answered 25/8, 2011 at 7:59 Comment(2)
+1: Whether the Map is made immutable explicitly at runtime with Collections.unmodifiableMap() or not, you should be changing the Map's contents once it has been set. If you need to be able to do this use ConcurrentHashMap and make it final i.e. use putAll and keySet().retainAll() to update the ConcurrentHashMapClamant
I agree with @BeeOnRope: the code is not thread safe without volatile - not only could one thread see a stale map reference, but it could see an up-to-date reference pointing to an inconsistent map (i.e. not properly constructed).Charentemaritime
E
1

Would it work for you to use a ConcurrentHashMap and instead of swapping the entire config update the affected values in the hash map?

Excipient answered 25/8, 2011 at 7:58 Comment(2)
The reason I didn't bother with ConccurentHashMap is that I want to be able to do a complete swap of one good configuration for another good configuration in a single operation. Using a ConcurrentHashMap would probably imply that I would want to merge the configurations in some way.Crowboot
@Midpoint, I suspected that something like that was the reason.Excipient

© 2022 - 2024 — McMap. All rights reserved.