Java: how volatile guarantee visibility of "data" in this piece of code?
Asked Answered
B

1

1
Class Future
{
    private volatile boolean ready;
    private Object data;
    public Object get()
    {
        if(!ready) return null;
        return data;
    }

    public synchronized void setOnce(Object o)
    {
        if(ready) throw...;
        data = o;
        ready = true;
    }
}

It said "if a thread reads data, there is a happens-before edge from write to read of that guarantees visibility of data"

I know from my learning:

  1. volatile ensures that every read/write will be in the memory instead of only in cache or registers;
  2. volatile ensures reorder: that is, in setOnce() method data = o can only be scheduled after if(ready) throw..., and before ready = true; this guarantee that if in get() ready = true, data must be o.

My confusion is

  1. is it possible that when thread 1 is in setOnce(), reaches the point that after data = o; before ready = true; At the same time, thread 2 runs into get(), read ready is false, and return null. And thead 1 continues ready = true. In this scenario, Thread 2 didn't see the new "data" even though data has been assigned new value in thread 1.

  2. get() isn't synchronized, that means the synchronized lock cannot protect setOnce() since thread 1 calls get() that needn't acquire the lock to access variable ready, data. So thread are not guaranteed to see the latest value of data. By this, I mean lock only guarantee the visibility between synchronized blocks. Even though one thread is running synchronized block setOnce(), another thread is still can go into get() and access ready and data without blocking and may see the old value of these variables.

  3. in get(), if ready = true, data must be o? I mean this thread is guaranteed to see the visibility of data? I think data is not a volatile nor the get() synchronized. Is this thread may see the old value in the cache?

Thanks!

Burse answered 11/11, 2015 at 0:0 Comment(14)
What language is this? Java?Iorgo
Also, your 1 is mostly false. The volatile keyword has to do with memory visibility, not caches. Caches are handled by cache coherency hardware. And that would be an obviously awful design that nobody would use -- memory is way too slow to use that way.Iorgo
@DavidSchwartz in Java a variable can be stored in cache memory. L1 and L2 cache memory are invisible for distinct threads, using volatile the value is stored in main memory or L3 cache ( main memory and L3 cache memory are shared between threads ). More infoGraft
@VelkoGeorgiev That's totally and completely false. That's not how caches work. It's a common myth, but it's just that, a myth. The volatile keyword has nothing whatsoever to do with these caches. Access to a volatile can remain entirely in an L1 cache with no issues. (Sadly, the article that you linked to repeats the myth.)Iorgo
@VelkoGeorgiev I made some comments on the article. It's infuriating when someone who so thoroughly misunderstands an important issue tries to teach it to other people.Iorgo
@DavidSchwartz I disagree check this link StackOverflow "Volatile variable: If two Threads(suppose t1 and t2) are accessing the same object and updating a variable which is declared as volatile then it means t1 and t2 can make their own local cache of the Object except the variable which is declared as a volatile . "Graft
@VelkoGeorgiev Unfortunately, that's subtly incorrect as well. I made a comment to that answer as well. This is a distressingly common misunderstanding.Iorgo
So if its a wrong why when you make a while loop whit a flag , and the flag is NOT volatile, when you update the flag from another thread the loop keeps running ?Graft
@VelkoGeorgiev It can be wrong for any reason. That is, there is nothing that guarantees it will work, so it can fail for any reason at all. The most common reason it will fail on typical, modern computers is that the JVM optimizes out the CPU instructions that would fetch the variable from cache, instead keeping it in a register. That is, it fails because of JVM optimizations (that volatile disables), nothing to do with CPU caches. (This really is something that very, very few people actually understand.)Iorgo
Ok check this quote from the book Java concurrency in Practice ( its a photo) LINK HEREGraft
That quote is entirely true. But it's slightly misleading because the L1/L2/L3 caches on modern CPUs have hardware cache coherency and don't hide things from other processors, and so have nothing to do with volatile. Some CPUs do have prefetch buffers or write posting buffers that do hide things from other processors, and volatile does have to work around those.Iorgo
Let us continue this discussion in chat.Graft
@VelkoGeorgiev Thanks for your answer. But could you help me with my three questions? I am really confused by these things and want to know what is right.Burse
@DavidSchwartz Thanks. same asking for help about my three questions.Burse
I
3

volatile ensures that every read/write will be in the memory instead of only in cache or registers;

Nope. It just ensures it's visible to other threads. On modern hardware, that doesn't require accessing memory. (Which is a good thing, main memory is slow.)

volatile ensures reorder: that is, in setOnce() method data = o can only be scheduled after if(ready) throw..., and before ready = true; this guarantee that if in get() ready = true, data must be o.

That's correct.

is it possible that when thread 1 is in setOnce(), reaches the point that after data = o; before ready = true; At the same time, thread 2 runs into get(), read ready is false, and return null. And thead 1 continues ready = true. In this scenario, Thread 2 didn't see the new "data" even though data has been assigned new value in thread 1.

Yes, but if that's a problem, then you shouldn't be using code like this. Presumably, the API for this code would be that get is guaranteed to see the result if called after setOnce returns. Obviously, you can't guarantee that get will see the result before we're finished making them.

get() isn't synchronized, that means the synchronized lock cannot protect setOnce() since thread 1 calls get() that needn't acquire the lock to access variable ready, data. So thread are not guaranteed to see the latest value of data. By this, I mean lock only guarantee the visibility between synchronized blocks. Even though one thread is running synchronized block setOnce(), another thread is still can go into get() and access ready and data without blocking and may see the old value of these variables.

No. And if this were true, synchronization would be almost impossible to use. For example, a common pattern is to create an object, then acquire the lock on a collection and add the object to the collection. This wouldn't work if acquiring the lock on the collection didn't guarantee that the writes involved in the creation of the object were visible.

in get(), if ready = true, data must be o? I mean this thread is guaranteed to see the visibility of data? I think data is not a volatile nor the get() synchronized. Is this thread may see the old value in the cache?

Java's volatile operation is defined such that a thread that sees a change to one is guaranteed to see all other memory changes the thread that made that change made before it made the change the thread saw. This is not true in other languages (such as C or C++). This may make Java's volatiles more expensive on some platforms, but fortunately not on typical platforms.

Also, please don't talk about "in the cache". This has nothing to do with caches. This is a common misunderstanding. It has to do with visibility, not caching. Most caches provide full visibility into the cache (punch "MESI protocol" into your favorite search engine to learn more) and don't require anything special to ensure visibility.

Iorgo answered 11/11, 2015 at 1:33 Comment(8)
First I really appreciate you take your time help me with this detailed answer!Burse
Question 1 is solved now. But for the 2nd question, I read this from the document "When a thread releases an intrinsic lock, a happens-before relationship is established between that action and any subsequent acquisition of the same lock." this is what confuse me. I am thinking your example about the collection. What will happen if t1 doing some change to the collection by unsynchronized method change() while your synchronized add() is running by t2. t1 won't block right since that is a synchronized method do. t1 will see the added collection or not or other things?Burse
@ChristyLin That happens-before relationship means that anything that that thread did before is releases the lock will be visible to any thread that later acquires that same lock. You would need all accesses to the collection to be synchronized for this to work. In Java, volatile establishes this same relationship -- if a thread modifies any volatile variable, a thread that sees that modification will see all modifications made before that one too. Again, this is specific to Java.Iorgo
Now I understand the effects volatile brings. But let's say the collection class doesn't involve any volatile variable, all it has is a synchronized add() and a unsynchronized change(), then add() is not guaranteed to be visible to change(), right? If I need these two methods to be visible to each other, I must make these two all are synchronized, right?Burse
@ChristyLin Yes, that's correct. You have to do something in each operation such that those two things establish a "before/after" relationship.Iorgo
Thank you so much! Now all my confusion gone. So happy! Have a lovely day!Burse
@DavidSchwartz I'm trying to understand your It has to do with visibility, not caching. Most caches provide full visibility into the cache. I posted a question here. My confusion is about SINGLE variable visibility, not multiple variables' reordering issue. My question is: since cache coherence protocol (MESI) can guarantee visibility of a single variable, why must we need volatile to ensure visibility to other threads?Archiepiscopal
@Archiepiscopal Because the relevant standards say so. The CPU is not required to have a cache coherence protocol and since the developers of your compiler know that, they are permitted to make optimizations that are broken by your assumption that the CPU has one. It's upside all around. The CPU having a cache coherency protocol allows the compiler to make very effective optimizations and so does the fact that you are not allowed to rely on the cache coherency protocol (only the compiler is).Iorgo

© 2022 - 2024 — McMap. All rights reserved.