Does Java allow a volatile read to be optimized away if the value isn't needed, also removing the happens-before synchronization?
Asked Answered
J

3

16

The following code sample shows a common way to demonstrate concurrency issues caused by a missing happens-before relationship.

private static /*volatile*/ boolean running = true;
    
public static void main(String[] args) throws InterruptedException {
    new Thread() {
        @Override
        public void run() {
            while (running) {
                // Do nothing
            }
        }
    }.start();
    Thread.sleep(1000);
    running = false;
}

If running is volatile, the program is guaranteed to terminate after approximately one second. However, if running isn't volatile, the program isn't guaranteed to terminate at all (since there is no happens-before relationship or guarantee for visibility of changes to the variable running in this case) and that's exactly what happens in my tests.

According to JLS 17.4.5 one can also enforce a happens-before relationship by writing to and reading another volatile variable running2, as shown in the following code sample.

private static boolean running = true;
private static volatile boolean running2 = true;
    
public static void main(String[] args) throws InterruptedException {
    new Thread() {
        @Override
        public void run() {
            while (running2 || running) {
                // Do nothing
            }
        }
    }.start();
    Thread.sleep(1000);
    running = false;
    running2 = false;
}

The volatile variable running2 is read in each loop iteration and when it is read as false after approximately one second, it is also guaranteed that the variable running is read as false subsequently, due to the happens-before relationship. Thus the program is guaranteed to terminate after approximately one second and that's exactly what happens in my tests.

However, when I put the read of the variable running2 into an empty if statement inside the while loop, as shown in the following code sample, the program doesn't terminate in my tests.

private static boolean running = true;
private static volatile boolean running2 = true;
    
public static void main(String[] args) throws InterruptedException {
    new Thread() {
        @Override
        public void run() {
            while (running) {
                if (running2) {
                    // Do nothing
                }
            }
        }
    }.start();
    Thread.sleep(1000);
    running = false;
    running2 = false;
}

The idea here is that a volatile read of running2 is like a compiler memory barrier: the compiler has to make asm that re-reads non-volatile variables because the read of running2 might have synchronized-with a release operation in another thread. That would guarantee visibility of new values in non-volatile variables like running.

But my JVM seems not to be doing that. Is this a compiler or JVM bug, or does the JLS allow such optimizations where a volatile read is removed when the value isn't needed? (It's only controlling an empty if body, so the program behaviour doesn't depend on the value being read, only on creating a happens-before relationship.)

I thought the JLS applies to the source code and since running2 is volatile, the effect of reading the variable shouldn't be allowed to be removed due to an optimization. Is this a compiler or JVM bug, or is there a specification, which actually allows such optimizations?

Janik answered 22/7, 2022 at 12:30 Comment(17)
Did you try reversing the order of running2 and running in your loop condition? Also, can you explain why 17.4.5 says that auxillary variables, running in this case, are also updated when any volatile variable is read? I think a synchronized block would cause that.Forbidding
@Forbidding reversing the order of the variables entirely misses the purpose of the question, but the answer is that no happens-before relationship occurs because the volatile field isn't read. And yes, reading any volatile field ensures a happens-before relationship for all changes that happened before due to the example in the linked paragraph: docs.oracle.com/javase/specs/jls/se8/html/…Janik
I think you're drawing the wrong conclussion. If you have while( running2 || running ) then "running" never gets read because "||" is a short circuit operator. Therefore it never gets optimized out. You set running2 to false, and running2 is volatile so it has to be updated (happens-before) then running gets read for the first time and the loop ends.Forbidding
@Forbidding sorry I linked the wrong example, alternatively see this https://mcmap.net/q/336369/-happens-before-relationships-with-volatile-fields-and-synchronized-blocks-in-java-and-their-impact-on-non-volatile-variables an many other answers on stackoverflow.Janik
@Forbidding "I think you're drawing the wrong conclussion. ..." No, what you are saying is exactly what I want to show with this example, plus the fact that running is read correctly as false (due to the happens-before relationship) as soon as it's actually read.Janik
The answer you link there is directly contradicted in the JLS you link. "Thread 1 writes to s happens before Thread 1 writes to b (program order rule)" The JLS says that if nothing is affected it is ok to re-order that write.Forbidding
@Janik what platform are you on? I can't reproduce third case on MacOS x86/Java 17, moreover, running2 = false is not required.Limit
@Forbidding "To show what you want to show, ..." But that's not what I want to show. What you are saying is correct. "Then when you set running to false, it will not stop the loop. ..." Yes, but for two reasons actually: 1) because running2 is still true and 2) because the thread isn't guaranteed to see running as falseJanik
@AndreyB.Panfilov I'm using the Eclipse JDT core compiler with compiler compliance level 1.8 and execution environment JavaSE-1.8 (jdk-17.0.1.12-hotspot) on Windows 10 64 bit to test this. The write running2 = false; is required afaik to ensure a happens-before relationship.Janik
So you're hoping that reading a volatile will act as a compile-time memory barrier (like GNU C asm("" ::: "memory")) and force the JVM to make asm that re-reads the non-volatile as well. (Because that happens after a load that might have synced-with.) Yeah, this sounds reasonable, if the compiler can't optimize away the volatile read even though its value isn't needed (only controlling an empty if). So basically the question is whether the JLS allows removing such loads, removing the only thing that could be syncing with another thread.Broadsword
@PeterCordes yes exactly, and that's what I would expect to happen according to the JLS.Janik
@Janik just checked, the same code compiled by OpenJDK and Eclipse JDT behaves differently, OpenJDK behaviour meets your expectations.Limit
@Janik The write running2 = false; is required afaik to ensure a happens-before relationship - where is the local value of running visible to main thread stored after exiting #main method (i.e. after main thread exited)? we have written it, we can't lose it.Limit
In your example with the if(running2) {}, you claim Thread doesn't finish execution because the if statement is optimized out. Therefore there is never a volatile read, so the while loop coninues execution. You're claiming this is a bug because the volatile read should act like a memory barrier and both running and running2 should be updated. I don't think this is a bug because I don't think running needs to be updated when there is an update to running2. It would help your question if you can show that with the JLS. Also, the if example exits fine for me, openjdk version "11.0.15"Forbidding
This answer claims that JCIP says that all the changes should be visible. It would be good if there was a reference to the JLS though.Forbidding
@PeterCordes the compiler is allowed to remove a volatile load/store (or synchronized block) if the sole purpose is to act like a memory fence. For examples see github.com/openjdk/jcstress/blob/master/jcstress-samples/src/…. Therefor I'm pretty sure that in the last example, the empty if (running2) {} could be optimized out.Takashi
@pveentjer: Interesting, that sounds like an answer, especially if you know where to look in the JLS to justify that. (Hopefully it's not like user19607013's answer suggests, and based on allowing an infinite delay in visibility of a volatile write.)Broadsword
D
4

... does the JLS allow it to remove a volatile read when the value isn't needed? (It's only controlling an empty if body, so the program behaviour doesn't depend on the value read, only on creating a happens-before.)

According to 17.4. Memory Model of the JLS:

The memory model describes possible behaviors of a program. An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.

So the JLS permits literally anything in runtime as long as result of the execution is "legal".

By "result of the execution" the JLS means all the external actions that the program performs: operations with files and network sockets, various system calls (e.g. reading current time), etc.
I believe that 17.4.9. Observable Behavior and Nonterminating Executions of the JLS is about that (or something like that).

In your example the only external action is sleeping for 1s, so your program can be "optimized" to:

    public static void main(String[] args){
        Thread.sleep(1000);
    }

If the answer above is correct in that an infinite loop is also a legal execution, then, I guess, your program can be "optimized" to:

    public static void main(String[] args){
        while(true);
    }

One more time: the runtime is allowed to do anything as long as it performs the same external actions as one of the legal executions allowed by the JLS.

To clarify things more, let's get legal executions for our example.

General Algorithm

The general algorithm is described in 17.4. Memory Model of the JLS.

The actions of each thread in isolation must behave as governed by the semantics of that thread, with the exception that the values seen by each read are determined by the memory model.

So we assume that actions in each thread are executed sequentially, one-by-one.
The only difference from a single-threaded program is that for a variable accessed from multiple threads a read might return an "unexpected" value.

The rule to to get all possible values for a read is this:

Informally, a read r is allowed to see the result of a write w if there is no happens-before ordering to prevent that read.

In other words, a read of some variable returns:

  • either the last write to than variable in happens-before order
  • or any write to than variable, that is not related to the read by happens-before

Note that the algorithm doesn't allow to "optimize out" anything.

Legal Executions For The Example

Now let's apply the algorithm to our example to find legal executions.
(Note: for simplicity we'll omit cases like unexpected Error and termination of the program by the OS)

The main thread has no reads of shared variables, so it behaves just like a single-threaded program.
It's actions:

new Thread(){...}.start();
Thread.sleep(1000);
running = false;
running2 = false;

The second thread is a loop with 2 reads.
So we get a sequence of actions:

read(running == true)
read(running2 == ?)
read(running == true)
read(running2 == ?)
...
read(running == false)

The sequence ends as soon as a read of running returns false.

What values can the reads return according to the JLS?
Let's first note that running2 is volatile, which means that reads ands writes to it happen in a global order (it's called synchronization order) and are visible to all threads in that order.
So:

  • before the write running2 = false becomes visible to the second thread:
    • running2 == true
      This is the initial write (the only visible write).

    • running == true or running == false
      For every read of running:

      • the initial write (running = true) happens-before the read
      • the write running = false is not related by happens-before with the read

      So each read running == ? can return any of the two writes randomly.

      Main Thread              Thread2                 
      
      [...]                    [...]                   
        ↓ (happens-before)       ↓ (happens-before)    
      running = false;         running2 == true;       
        ↓ (happens-before)       ↓ (happens-before)    
      running2 = false;        running == true | false 
      
  • after the write running2 = false becomes visible to the second thread:
    • running2 == false
      This is the latest visible write.
    • running == false
      Because running2 == false
      => running2 = false happens-before running2 == false
      => transitively running = false happens-before running == false
      Main Thread              Thread2              
      
      [...]                                         
        ↓ (happens-before)                          
      running = false;                              
        ↓ (happens-before)     [...]                
      running2 = false;          ↓ (happens-before) 
        └--------------------> running2 == false;   
            (happens-before)     ↓ (happens-before) 
                               running == false;    
      
      As a result, the second thread finishes when the write running2 = false becomes visible to it.

To sum up, all legal executions of the second thread:

  1. can start with this sequence:
read(running == true)
read(running2 == true)
[... repeat the fragment above ...]
  1. end with:
  • either:
    ...
    read(running2 == false)
    read(running == false)
    
    This is the case when the thread sees running2 = false and then is guaranteed to see running = false.
  • or with:
    ...
    read(running == false)
    
    This is the case when the thread doesn't see running2 = false, but the sees running = false.
  • or never ends.
    This is the case when the thread sees neither running2 = false nor running = false.

If you can "optimize out" a volatile read and the result of the execution will be the same as the results of some legal executions described above, then this optimization is legal.


Regarding the AdvancedJMM_15_VolatilesAreNotFences test mentioned in the comments.

It doesn't seem to me that this test demonstrates that the compiler is allowed to remove a volatile load/store if the value isn't used.

IMO it shows that volatile is weaker than UNSAFE.storeFence() + UNSAFE.loadFence().
Basically it's a demonstration of the Roach Motel optimization:
write can be reordered before release
read can be reordered after acquire
read/write can be moved inside acquire+release blocks

AdvancedJMM_14_SynchronizedAreNotFences is different because it uses synchronized (new Object()) {} — there are no shared variables and no happens-before relations.


P.S. @pveentjer mentioned in the comments this:

The non-normative section of the JVM does talk about visibility; so a change should become visible to other threads at some point.

Does anyone have a link and a quote to support that?
I cannot find it anywhere, but, as noted by Peter Cordes, it would be really usefull to know that Java (or even only some JVMs) doesn't allow an infinite delay in visibility of a volatile write.

Dynamics answered 28/7, 2022 at 15:22 Comment(9)
+1 for the good explanation. However, I'm still not fully convinced about the actual point in both answers, namely that the third bullet point "or it never ends" is a valid execution (or at least intended to be a valid execution). The reason for my doubts is that this would essentially mean (provided that my understanding of the answer correct) that it's impossible in Java to reliably send a signal from one thread to another thread (in reasonable real time) in any case. Or does this only happen in the cases in my question because there is no external (observable) action that violates?Janik
For example would the program be guaranteed to terminate, if I print debug messages at the end of main and inside the loop to get external (observable) actions? If the latter question can be answered with a "yes", then I think I can accept the answer.Janik
@Janik Regarding "'it never ends' is a valid execution". Personally I looked in both The Java Language Specification and The Java Virtual Machine Specification, and I cannot find there anything that limits the time that a volatile write propagates to another thread. As a result, at least for now, I have to assume the worst: that it can take forever and 'never ends' is a valid execution. It would be great if someone found a proof of the contrary.Dynamics
@Janik Regarding "it's impossible in Java to reliably send a signal from one thread to another thread (in reasonable real time)". If a volatile write can take forever to propagate to another thread, than your assumption is true. But to be fair, I suspect that nowadays a program on any programming language can hang forever: for example OS might prioritize other threads for any amount of time.Dynamics
@Janik In the end, I think that for any program in any language we don't have guarantees (and often specifications) for most levels of execution (the JVM, the OS, hardware, etc.), and we just assume that all of that is implemented in some sane way and expect from all of that some sane behavior.Dynamics
@Janik "would the program be guaranteed to terminate, if I print debug messages at the end of main and inside the loop to get external (observable) actions?" No. If the volatile write takes forever to propagate, then Thread2 is still not guaranteed to leave the loop, but it will print debug message on every iteration. And every valid Java implementation will have to print the debug messages as well.Dynamics
The TLS talks about 'immediately' visible. docs.oracle.com/javase/specs/jls/se8/html/…. "and each individual action is atomic and is immediately visible to every thread". Immediately visible makes no sense with SC. "visible as soon as possible" would be better IMHO.Takashi
@Takashi thank you for providing the quote. IMHO "immediately visible" there means that in a SC execution a read always sees the latest write. "Latest" here means the total order of actions in SC, the order is synthetic and doesn't even have to reflect the real times when the actions execute. So I don't think the quote means that it cannot take an infinite amount of time for a volatile write to propagate to another thread.Dynamics
@Dynamics I'm not sure what it means actually :) I have explained in various comments on this page that a read needs to see the most recent write before it in the happens-before order (or a write it is in data-race with). But the happens-before order doesn't need to reflect the happened before order (so the real-time order). So 'immediately' I find a very confusing addition. And for people that do not know the meaning of SC, it can even be misleading since it seems to imply that the JMM is preserving the real time order (which it isn't).Takashi
S
1

Is this a JVM bug, or does the JLS allow it to remove a volatile read when the value isn't needed?

It's neither.

This execution is valid according to the JLS.

The second thread must finish shortly after it reads running2 == true.
But the JLS provides no guarantees about the time it takes for a write in one thread to become visible in another thread.
As a result, your program execution is valid, because it corresponds to the case when the write running2 = false takes a very long time to propagate to another thread.

By the way on my java version (OpenJDK 64-Bit Server VM (build 17.0.3+7-suse-1.4-x8664, mixed mode)) the program finishes in about 1 second.
This is also a valid execution — this corresponds to the case when the write running2 = false propagate to the second thread quicker.

PS you mentioned "memory barrier".
For a memory barrier there typically exists some max time, after which it is guaranteed to propagate to other threads.
But the JLS doesn't operate in terms of memory barriers, doesn't have to use them, and actually guarantees only this:

An implementation is free to produce any code it likes, as long as all resulting executions of a program produce a result that can be predicted by the memory model.

PSS If you want to see the real assembly code that the JVM produced for your program you can use +PrintAssembly.

Subotica answered 23/7, 2022 at 12:39 Comment(18)
Delayed visibility due to sequential consistency not needing to preserve the real-time order of requests is one reason why the loop could keep running. But the removal of a volatile read/write (or synchronized block) that only serves as a fence, is another cause. For a concrete example see github.com/openjdk/jcstress/blob/master/jcstress-samples/src/… I believe the 'if(running2){} could be optimized out for that reason.Takashi
running2 is the volatile variable. (Unfortunately the question picked names that don't help you remember which is which.) Anyway, volatile writes should always be visible to volatile reads promptly, shouldn't they? C++ for example just says "finite time" and "reasonable amount of time" as recommendations; it seems to me disappointing if Java didn't at least strongly recommend that infinite delays shouldn't be possible.Broadsword
@PeterCordes The non-normative section of the JVM does talk about visibility; so a change should become visible to other threads at some point. But the JMM itself makes no guarantee how much skewing is allowed (at least I don't recall seeing it).Takashi
@pveentjer: Ok, C++ is the same, that "finite time" is a non-normative note. (this answer quotes both notes). But is that the only thing that allows optimizing away a volatile load here? An implementation choice in how to interpret the standard to sometimes allow infinite delays?Broadsword
"But the JLS provides no guarantees about the time it takes for a write in one thread to become visible in another thread." Where do you get this from? running2 is volatile, and any subsequent read should see the updated value. From 17.4.4 "A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread"Forbidding
Subsequent in the happens-before order; not in the real-time order. A read will see the most recent write before it in the happens-before order, or a write it is in data race with. That defines happens-before consistent.Takashi
@Forbidding a good example that violates the real-time order is the use of store buffers in CPUs. Stores are placed in the store buffer, to prevent stalling the CPU, and the store will not be immediately visible. First, the store needs to retire (so isn't speculative anymore) and then it can take some time for the store to leave the store buffer and to be committed to the coherent cache (e.g the store needs to do an RFO and wait for the acknowledgment of invalidation of the corresponding cache line from all other CPUs). Only when it is committed, it is visible to other CPUs.Takashi
@Forbidding so imagine that a thread would only execute a volatile store and do nothing else, then it is obvious that this store isn't visible when it is executed, not visible when it retires, not visible when it enters the linefillbuffer and the request for ownership (RFO) is send, but only visible at the moment it is committed to the cache. So if there would be another thread (different CPU) that executes only a volatile read and this volatile read is performed before the RFO is processed, it is allowed to return the value before the store has executed even though the store has executed/retired.Takashi
@Forbidding But unless you have the ability to actually measure the timing of every state of an instruction on every CPU, you will not be able to prove the realtime order was violated. If the real-time order needs to be preserved, you need linearizability. Linearizability = sequential consistency + (pretending) to respect the real time order. With linearizability you get the guarantee that the operation will take effect between invocation start and completion (its linearization point); which clearly isn't the case with a store buffer. jepsen.io/consistency/models/linearizableTakashi
@Forbidding Another cause of loads/stores being skewed is the out-of-order processing abilities of a modern processor with can lead to loads and stores to move around with respect to the 'program order'. E.g. a load could be moved forward (executed earlier) and therefore see an old value instead of a newer value compared to if it would have been executed in order.Takashi
@Forbidding this is a very good read: happening-before != happens-before. preshing.com/20130702/the-happens-before-relationTakashi
@Takashi I am asking about the claim, "running2=falsetakes a very long time to propagate to another thread." Which is a bit ambiguous, but it is describing the application not ending within seconds after running2 is set to false. The effects you a writing about are going to be on the order of clock cycles not seconds? Maybe I am mistaken.Forbidding
According to the specification it is allowed to skew for a very long period since there are no real time guarantees given. But from an implementation point of view I would already would be surprised if it takes more than a few hundred nanoseconds. The store buffer has a limited capacity and the CPU tries to drain the store buffer as fast as possible because otherwise, the CPU would need to stall (store buffer will fill up and no store instructions can be scheduled).Takashi
@Forbidding keep in mind that the compiler can change the code so that a load or store never becomes visible. So the skew would be indefinite.Takashi
@pveentje I think that is explicitly what volatile is to protect against. The next read needs to be needs to be there changed version. There are some practical concerns, but reading/writing a volatile field is synchronized.Forbidding
@Forbidding if nothing is done with the value, then there are no recency guarantees. That is the reason why using volatiles to simulate fences doesn't work. github.com/openjdk/jcstress/blob/master/jcstress-samples/src/…Takashi
I agree with that completely, I take that to mean running2 will be updated, but running might not be updated.Forbidding
@Forbidding My understanding is that also running2 doesn't need to be updated since nobody is there to observe the read value. So since the value isn't used, nobody is there to proof that the value wasn't read.Takashi
W
1

Both other answers say the main reason for the behavior in the question is the fact that the JMM doesn't guarantee a volatile write to become visible to other thread in some finite time.

Recently I've stumbled upon some additional information related to this topic and I would like to share it here because it might be useful to people interested in this topic.
The information is from "The Java Memory Model" article written by the authors of the JMM:

                    Initially, v is volatile and v = false                 
Thread 1                             | Thread 2                            
-------------------------------------+-------------------------------------
while (!v);                          | v = true;                           
System.out.println("Thread 1 done"); | System.out.println("Thread 2 done");

If we observe print message by thread 2, Thread 1 must see write to v, print its message and terminate. But program can also be observed to hang and not print any messages.

Figure 9: Fairness Example

...

... in Figure 9, if we observe the print message from Thread 2, and no threads other than Threads 1 and 2 are running, then Thread 1 must see the write to v, print its message and terminate. This prevents the compiler from hoisting the volatile read of v out of the loop in Thread 1.
The fact that Thread 1 must terminate if the print by Thread 2 is observed follows from the rules on observable actions. If the print by Thread 2 is in a set of observable actions O, then the write to v and all reads of v that see the value 0 must also be in O. Additionally, the program cannot perform an unbounded amount of additional actions that are not in O. Therefore, the only observable behavior of this program in which the program hangs (runs forever without performing additional external actions) is one in which it performs no observable external actions other than hanging. This includes the print action.

As you can see in some cases (such as the example in Figure 9) volatile writes must become visible to other threads because of the rules in 17.4.9. Observable Behavior and Nonterminating Executions.

Weasel answered 6/7, 2023 at 13:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.