Understanding Java volatile visibility

Asked 18/7, 2021 at 8:42 Answered 23/9, 2021 at 1:15

Solved java multithreading caching volatile

I'm reading about the Java volatile keyword and have confusion about its 'visibility'.

A typical usage of volatile keyword is:

volatile boolean ready = false;
int value = 0;

void publisher() {
    value = 5;
    ready = true;
}

void subscriber() {
    while (!ready) {}
    System.out.println(value);
}

As explained by most tutorials, using volatile for ready makes sure that:

change to ready on publisher thread is immediately visible to other threads (subscriber);
when ready's change is visible to other thread, any variable update preceding to ready (here is value's change) is also visible to other threads;

I understand the 2nd, because volatile variable prevents memory reordering by using memory barriers, so writes before volatile write cannot be reordered after it, and reads after volatile read cannot be reordered before it. This is how ready prevents printing value = 0 in the above demo.

But I have confusion about the 1st guarantee, which is visibility of the volatile variable itself. That sounds a very vague definition to me.

In other words, my confusion is just on SINGLE variable's visibility, not multiple variables' reordering or something. Let's simplify the above example:

volatile boolean ready = false;

void publisher() {
    ready = true;
}

void subscriber() {
    while (!ready) {}
}

If ready is not defined volatile, is it possible that subscriber get stuck infinitely in the while loop? Why?

A few questions I want to ask:

What does 'immediately visible' mean? Write operation takes some time, so after how long can other threads see volatile's change? Can a read in another thread that happens very shortly after the write starts but before the write finishes see the change?
Visibility, for modern CPUs is guaranteed by cache coherence protocol (e.g. MESI) anyway, then why do we need volatile here?
Some articles say volatile variable uses memory directly instead of CPU cache, which guarantees visibility between threads. That doesn't sound a correct explain.

   Time : ---------------------------------------------------------->

 writer : --------- | write | -----------------------
reader1 : ------------- | read | -------------------- can I see the change?
reader2 : --------------------| read | -------------- can I see the change?

Hope I explained my question clearly.

Carranza answered 18/7, 2021 at 8:42 Comment(6)

All writes to volatile variables are atomic, so I don't think reader1 in your example can do what it does. You either read the value before the write, or after the write. – Simply 18/7, 2021 at 8:50

Thanks. It's quite reasonable that reader1 cannot see the change. But I'm still trying to figure out reader2's view. Can reader2 definitely see the change? What if variable is not volatile, can reader2 still see what is wants? What difference does volatile actually makes here? – Carranza 18/7, 2021 at 9:0

Umm I didn't say reader1 cannot see the change. I said reader1 can't be reading after the write starts but before the write ends, because writes are atomic. Reads either occur before, or after the write. If the variable is not volatile, reader2 might not see the change. – Simply 18/7, 2021 at 9:3

Can reader2 see the change after sufficiently long time? Or do you mean reader2 may never see it? As I understand, CPU cache coherence guarantees reader2 will finally see the change, not after very long time. So what is the difference between using volatile or not? – Carranza 18/7, 2021 at 9:15

One question: you said when ready's change is visible to other thread, any variable update preceding to ready (here is value's change) is also visible to other threads;... But where do you get this notion from? Never heard of it. I believe volatile and non-volatiles can be handled independently by the compiler, removing any such guarantee. – Nigeria 18/7, 2021 at 10:19

@Nigeria docs.oracle.com/javase/specs/jls/se8/html/…. Volatile limits compiler and runtime reordering, which synchronizes changes before volatile change. More specifically, it not only guarantees visibility of volatile change itself, but also the side effects of the code that led up the change. And you may want read JSR-133 Cookbook – Carranza 18/7, 2021 at 12:37

Visibility, for modern CPUs is guaranteed by cache coherence protocol (e.g. MESI) anyway, so what can volatile help here?

That doesn't help you. You aren't writing code for a modern CPU, you are writing code for a Java virtual machine that is allowed to have a virtual machine that has a virtual CPU whose virtual CPU caches are not coherent.

Some articles say volatile variable uses memory directly instead of CPU cache, which guarantees visibility between threads. That doesn't sound a correct explain.

That is correct. But understand, that's with respect to the virtual machine that you are coding for. Its memory may well be implemented in your physical CPU's caches. That may allow your machine to use the caches and still have the memory visibility required by the Java specification.

Using volatile may ensure that writes go directly to the virtual machine's memory instead of the virtual machine's virtual CPU cache. The virtual machine's CPU cache does not need to provide visibility between threads because the Java specification doesn't require it to.

You cannot assume that characteristics of your particular physical hardware necessarily provide benefits that Java code can use directly. Instead, the JVM trades off those benefits to improve performance. But that means your Java code doesn't get those benefits.

Again, you are not writing code for your physical CPU, you are writing code for the virtual CPU that your JVM provides. That your CPU has coherent caches allows the JVM to do all kinds of optimizations that boost your code's performance, but the JVM is not required to pass those coherent caches through to your code and real JVM's do not. Doing so would mean eliminating a significant number of extremely valuable optimizations.

Barrick answered 19/7, 2021 at 5:32 Comment(19)

Caches are always coherent :) And JMM isn't expressed in terms of coherence. So I would not mentioned it at all if you do not want to leave out hardware. – Inactivate 19/7, 2021 at 5:33

@Inactivate No. The virtual caches in virtual machines are typically not coherent because they are generally implemented using the physical CPU's registers without cross-thread awareness. Using volatile forces the virtual CPU's virtual caches to be coherent by avoiding using the physical CPU's registers to create a virtual memory cache. This is allowed because the virtual CPU does not have to provide cache coherence, and it speeds up Java code significantly. – Barrick 19/7, 2021 at 5:35

I have never heard of such a thing. Coherence is taken care of at the hardware level. So if you write to a cache-line at one cache it will lead to an invalidation of that cache-line on all other caches. This is completely independent of 'virtual caches in virtual machines'. So either we leave coherence out of the JMM disucssion, or we can dig into actual implementation of cache coherence on modern processors. – Inactivate 19/7, 2021 at 5:37

@Inactivate You've never heard of a JVM using the physical CPU's registers as a cache to avoid memory accesses? – Barrick 19/7, 2021 at 5:38

Cache has a very specific meaning on a processor. And I have not seen the term being used for 'registers'. The JMM stays clear from that whole discussion. – Inactivate 19/7, 2021 at 5:43

@Inactivate When you write Java code, you are writing it for a virtual machine that is implemented on a physical machine. Typical JVMs implement some of the virtual machine's caches using the physical machine's registers. The OP is getting confused because he is reading information about the virtual machine and thinking it applies to the physical machine. You seem to be doing the same thing. If you wrote software to implement a virtual CPU, you would implement the components of that virtual CPU (including the caches) using the features of the physical CPU (including the registers). – Barrick 19/7, 2021 at 5:45

The JMM doesn't discuss caches at all. So I don't see the point of adding them to the discussion of the JMM. If you want to talk hardware, that is fine. Then registers != cache. – Inactivate 19/7, 2021 at 5:46

I mostly agree with you, but unfortunately not everybody does. As the OP says, "Some articles say volatile variable uses memory directly instead of CPU cache, which guarantees visibility between threads." This is talking about how the JVM implements the caches of the virtual CPU that you are writing Java code for. I'm helping the OP to understand what they're talking about. It is helpful to understand what these people are saying and the distinction between the physical CPU in your computer and the virtual CPU you are targeting with your code. – Barrick 19/7, 2021 at 5:47

I find the concept of 'virtual core' hard to swallow in terms of the JMM since JMM doesn't discuss hardware at all. It is a different way of looking at it and perhaps I'll add to my toolbox. Till so far I never had to need for such an abstraction. Understanding the JMM was sufficient for correctness and understanding the hardware helps me to understand the performance implications and it also helps a lot to understand abstract memory models like JMM and the memory model of C/C++. – Inactivate 19/7, 2021 at 5:58

@Inactivate I'm not sure why you see no value in understanding the characteristics of the virtual CPU that you are writing code for when you write Java code. For one thing, understanding that it doesn't have to provide coherent caches helps to understand why you need volatile regardless of what attributes your physical CPU has. – Barrick 19/7, 2021 at 5:59

Also adding a component like a virtual core to the mixture and saying that is caches don't need to be coherent, helps to perpetuate the belief that cpu caches aren't coherent and that writing or reading to a volatile variable is very expensive (due to main memory communication). So that is why I'm either in favor of the abstract model or in terms of physical hardware. But not some in between model. – Inactivate 19/7, 2021 at 6:0

It is true though. The virtual CPU's caches aren't coherent. If they were, it would not be possible to use the physical CPU's registers to cache variables during operations. Reading and writing to a volatile variable is more expensive precisely because it requires writing to the virtual CPU's main memory. These are actual facts. – Barrick 19/7, 2021 at 6:2

Caching data in registers can easily be explained by existing JMM concepts like reordering or visibility. So currently I don't see the need to introduce a new concept like a virtual core. But it doesn't mean that the model is useless; perhaps I'll add it to my toolbox. – Inactivate 19/7, 2021 at 6:3

Let us continue this discussion in chat. – Inactivate 19/7, 2021 at 6:4

I wrote a blogpost about the coherence memory model and the exact requirements. It also explains why the SC implies cache-coherence. pveentjer.blogspot.com/2021/07/what-is-coherence.html – Inactivate 22/7, 2021 at 6:45

And it explains why coherence doesn't rule out compiler optimizations. Coherence doesn't need to respect the real-time order. So reads and writes can be skewed (e.g. placing them in registers). As long as loads/stores to a single address are not reordered. – Inactivate 22/7, 2021 at 8:9

@Inactivate the article is gone? – Kearse 23/9, 2021 at 1:5

I'm in the process of updating it. – Inactivate 23/9, 2021 at 2:30

There is a key insight which I was missing. Coherence, in any form of literature you will find, is defined as a total order. Opaque provides coherence, it doesn't provide a total order. So they named it coherence, but actually it isn't. – Inactivate 23/9, 2021 at 2:31

Relevant bits of the language spec:

volatile keyword: https://docs.oracle.com/javase/specs/jls/se16/html/jls-8.html#jls-8.3.1.4

memory model: https://docs.oracle.com/javase/specs/jls/se16/html/jls-17.html#jls-17.4

The CPU cache is not a factor here, as you correctly said.

This is more about optimizations. If ready is not volatile, the compiler is free to interpret

// this
while (!ready) {}

// as this
if (!ready) while(true) {}

That's certainly an optimization, it has to evaluate the condition fewer times. The value is not changed in the loop, it can be "reused". In terms of single-thread semantics it is equivalent, but it won't do what you wanted.

That's not to say this would always happen. Compilers are free to do that, they don't have to.

Handcar answered 18/7, 2021 at 8:58 Comment(4)

Thanks. I forgot to mention about compiler's magic. So assuming compiler doesn't do any optimization, can we say that ready's change will definitely be seen by subscriber and the code will jump out while loop correctly? – Carranza 18/7, 2021 at 9:10

@Carranza The meaning of "doesn't do any optimization" may be a bit complex in java. We can assume the read to memory will be in the byte code (that part can easily be checked using javac and javap...). And then you would either have to run a byte code interpreter without a JIT or the JIT issuing memory reads without reordering the code too much. – Handcar 18/7, 2021 at 9:22

@dratenik Uh that makes the question complex. Sorry I did not state clearly. I mean if compiler does not do the optimize as you wrote, and does not reorder code. What I'm trying to figure out is still about CPU cache and memory and what can volatile actually bring to us. Or my biggest question is: If compiler does not do the magic you mentioned, and CPU cache coherence can guarantee single variable's visibility between threads, why do we need volatile for that? – Carranza 18/7, 2021 at 9:36

@Carranza I understand the volatile keyword as a mark to the compiler "do not reorder, actually issue reads/writes". If we assume that the compiler was going to do that anyway, then yes, the keyword becomes meaningless. – Handcar 18/7, 2021 at 9:43

If ready is not defined volatile, is it possible that subscriber get stuck infinitely in the while loop?

Yes.

Why?

Because the subscriber may not ever see the results of the publisher's write.

Because ... the JLS does not require the value of an variable to be written to memory ... except to satisfy the specified visibility constraints.

What does 'immediately visible' mean? Write operation takes some time, so after how long can other threads see volatile's change? Can a read in another thread that happens very shortly after the write starts but before the write finishes see the change?

(I think) that the JMM specifies or assumes that it is physically impossible to read and write the same conceptual memory cell at the same time. So operations on a memory cell are time ordered. Immediately visible means visible in the next possible opportunity to read following the write.

Visibility, for modern CPUs is guaranteed by cache coherence protocol (e.g. MESI) anyway, so what can volatile help here?

Compilers typically generate code that holds variables in registers, and only writes the values to memory when necessary. Declaring a variable as volatile means that the value must be written to memory. If you take this into consideration, you cannot rely on just the (hypothetical or actual) behavior of cache implementations to specify what volatile means.
While current generation modern CPU / cache architectures behave that way, there is no guarantee that all future computers will behave that way.

Some articles say volatile variable uses memory directly instead of CPU cache, which guarantees visibility between threads.

Some people say that is incorrect ... for CPUs that implement a cache coherency protocol. However, that is beside the point, because as I described above, the current value of a variable may not yet have been written to the cache yet. Indeed, it may never be written to the cache.

   Time : ---------------------------------------------------------->

 writer : --------- | write | -----------------------
reader1 : ------------- | read | -------------------- can I see the change?
reader2 : --------------------| read | -------------- can I see the change?

So lets assume that your diagram shows physical time and represents threads running on different physical cores, reading and writing a cache-coherent memory cell via their respective caches.

What would happen at the physical level would depend on how the cache-coherency is implemented.

I would expect Reader 1 to see the previous state of the cell (if it was available from its cache) or the new state if it wasn't. Reader 2 would see the new state. But it also depends on how long it takes for the writer thread's cache invalidation to propagate to the others' caches. And all sorts of other stuff that is hard to explain.

In short, we don't really know what would happen at the physical level.

But on the other hand, the writer and readers in the above picture can't actually observe the physical time like that anyway. And neither can the programmer.

What the program / programmer sees is that the reads and writes DO NOT OVERLAP. When the necessary happens before relations are present, there will be guarantees about visibility of memory writes by one thread to subsequent¹ reads by another. This applies for volatile variables, and for various other things.

How that guarantee is implemented, is not your problem. And it really doesn't help if you do understand what it going on at the hardware level, because you don't actually know what code the JIT compiler is going to emit (today!) anyway.

^{1 - That is, subsequent according to the synchronization order ... which you could view as a logical time. The JLS Memory model doesn't actually talk about time at all.}

Asexual answered 19/7, 2021 at 6:51 Comment(4)

A volatile read is guaranteed to see the most recent write before it in the memory order, but isn't guaranteed to see the most recent write before it. SC (and coherence since coherence is a suborder of SC), do not provide real-time guarantees. For more information see: pveentjer.blogspot.com/2021/07/what-is-coherence.html – Inactivate 22/7, 2021 at 16:9

"A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.". That is a guarantee. – Asexual 22/7, 2021 at 23:5

That is correct. But the tricky part is that happens-before is not based on real-time. Please check the link I posted to provide you some insights. A store at wallclock time X doesn't need to be seen by a different thread at wallclock time X+1. That is perfectly fine with sequential consistency. Operations can be skewed. – Inactivate 23/7, 2021 at 2:3

Here is another link to confirm what I said: jepsen.io/consistency/models/sequential. "When you need real-time constraints (e.g. you want to tell some other process about an event via a side channel, and have that process observe that event), try linearizability." – Inactivate 23/7, 2021 at 2:11

Answers to your 3 questions:

A change of a volatile write doesn't need to be 'immediately' visible to a volatile load. A correctly synchronized Java program will behave as if it is sequential consistent and for sequential consistency the real time order of loads/stores isn't relevant. So reads and writes can be skewed as long as the program order isn't violated (or as long as nobody can observe it). Linearizability = sequential consistency + respect real time order. For more info see this answer.
I still need to dig into the exact meaning of visible, but AFAIK it is mostly a compiler level concern because hardware will prevent buffering loads/stores indefinitely.
You are completely right about the articles being wrong. A lot of nonsense is written and 'flushing volatile writes to main memory instead of using the cache' is the most common misunderstanding I'm seeing. I think 50% of all my SO comments is about informing people that caches are always coherent. A great book on the topic is 'A primer on memory consistency and cache coherence 2e' which is available for free.

The informal semantics of the Java Memory model contains 3 parts:

atomicity
visibility
ordering

Atomicity is about making sure that a read/write/rmw happens atomically in the global memory order. So nobody can observe some in between state. This deals with access atomicity like torn read/write, word tearing and proper alignment. It also deals with operational atomicity like rmw.

IMHO it should also deal with store atomicity; so making sure that there is a point in time where the store becomes visibly to all cores. If you have for example the X86, then due to load buffering, a store can become visible to the issuing core earlier than to other cores and you have a violation of atomicity. But I haven't seen it being mentioned in the JMM.

Visibility: this deals mostly with preventing compiler optimizations since the hardware will prevent delaying loads and buffering stores indefinitely. In some literature they also throw ordering of surrounding loads/stores under visibility; but I don't believe this is correct.

Ordering: this is the bread and butter of memory models. It will make sure that loads/stores issued by a single processor don't get reordered. In the first example you can see the need for such behavior. This is the realm of the compiler barriers and cpu memory barriers.

For more info see: https://download.oracle.com/otndocs/jcp/memory_model-1.0-pfd-spec-oth-JSpec/

Inactivate answered 19/7, 2021 at 5:14 Comment(0)

I'll just touch on this part :

change to ready on publisher thread is immediately visible to other threads

that is not correct and the articles are wrong. The documentation makes a very clear statement here:

A write to a volatile field happens-before every subsequent read of that field.

The complicated part here is subsequent. In plain english this means that when someone sees ready as being true, it will also see value as being 5. This automatically implies that you need to observe that value to be true, and it can happen that you might observe a different thing. So this is not "immediately".

What people confuse this with, is the fact that volatile offers sequential consistency, which means that if someone has observed ready == true, then everyone will also (unlike release/acquire, for example).

Kearse answered 23/9, 2021 at 1:15 Comment(3)

Best to see subsequent as the most recent read of 'ready' after the write of 'ready' in the happens-before order. – Inactivate 23/9, 2021 at 6:39

And I don't understand what you mean the reference to release consistency. The above example would work perfectly fine if ready would make use of a store release and an acquire load; would not work any different compared to SC. – Inactivate 23/9, 2021 at 6:40

@Inactivate "Best to see subsequent as the most recent read of 'ready' after the write of 'ready' in the happens-before order" - lovely wording. What I really meant there if there are other threads that might use ready, but it this example, sure - I agree. – Kearse 23/9, 2021 at 16:20

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags