Could the JIT collapse two volatile reads as one in certain expressions?

Asked 19/12, 2014 at 13:15 Answered 25/12, 2014 at 22:32

Solved java volatile jit memory-model java-memory-model

Suppose we have a volatile int a. One thread does

while (true) {
    a = 1;
    a = 0;
}

and another thread does

while (true) {
    System.out.println(a+a);
}

Now, would it be illegal for a JIT compiler to emit assembly corresponding to 2*a instead of a+a?

On one hand the very purpose of a volatile read is that it should always be fresh from memory.

On the other hand, there's no synchronization point between the two reads, so I can't see that it would be illegal to treat a+a atomically, in which case I don't see how an optimization such as 2*a would break the spec.

References to JLS would be appreciated.

Augusto answered 19/12, 2014 at 13:15 Comment(1)

I asked Jeremy Manson to take a look at this question and he confirms that collapsing the two reads into one is a legal optimization. – Augusto 7/1, 2015 at 10:52

Short answer:

Yes, this optimization is allowed. Collapsing two sequential read operations produes the observable behavior of the sequence being atomic, but does not appear as a reordering of operations. Any sequence of actions performed on a single thread of execution can be executed as an atomic unit. In general, it is difficult to ensure a sequence of operations executes atomically, and it rarely results in a performance gain because most execution environments introduce overhead to execute items atomically.

In the example given by the original question, the sequence of operations in question is the following:

read(a)
read(a)

Performing these operations atomically guarantees that the value read on the first line is equal to the value read on the second line. Furthermore, it means the value read on the second line is the value contained in a at the time the first read was executed (and vice versa, because atomic both read operations occurred at the same time according to the observable execution state of the program). The optimization in question, which is reusing the value of the first read for the second read, is equivalent to the compiler and/or JIT executing the sequence atomically, and is thus valid.

Original longer answer:

The Java Memory Model describes operations using a happens-before partial ordering. In order to express the restriction that the first read r1 and second read r2 of a cannot be collapsed, you need to show that some operation is semantically required to appear between them.

The operations on the thread with r1 and r2 is the following:

--> r(a) --> r(a) --> add -->

To express the requirement that something (say y) lie between r1 and r2, you need to require that r1 happens-before y and y happens-before r2. As it happens, there is no rule where a read operation appears on the left side of a happens-before relationship. The closest you could get is saying y happens-before r2, but the partial order would allow y to also occur before r1, thus collapsing the read operations.

If no scenario exists which requires an operation to fall between r1 and r2, then you can declare that no operation ever appears between r1 and r2 and not violate the required semantics of the language. Using a single read operation would be equivalent to this claim.

Edit My answer is getting voted down, so I'm going to go into additional details.

Here are some related questions:

Is the Java compiler or JVM required to collapse these read operations?

No. The expressions a and a used in the add expression are not constant expressions, so there is no requirement that they be collapsed.
Does the JVM collapse these read operations?

To this, I'm not sure of the answer. By compiling a program and using javap -c, it's easy to see that the Java compiler does not collapse these read operations. Unfortunately it's not as easy to prove the JVM does not collapse the operations (or even tougher, the processor itself).
Should the JVM collapse these read operations?

Probably not. Each optimization takes time to execute, so there is a balance between the time it takes to analyze the code and the benefit you expect to gain. Some optimizations, such as array bounds check elimination or checking for null references, have proven to have extensive benefits for real-world applications. The only case where this particular optimization has the possibility of improving performance is cases where two identical read operations appear sequentially.

Furthermore, as shown by the response to this answer along with the other answers, this particular change would result in an unexpected behavior change for certain applications which users may not desire.

Edit 2: Regarding Rafael's description of a claim that two read operations that cannot be reordered. This statement is designed to highlight the fact that caching the read operation of a in the following sequence could produce an incorrect result:

a1 = read(a)
b1 = read(b)
a2 = read(a)
result = op(a1, b1, a2)

Suppose initially a and b have their default value 0. Then you execute just the first read(a).

Now suppose another thread executes the following sequence:

a = 1
b = 1

Finally, suppose the first thread executes the line read(b). If you were to cache the originally read value of a, you would end up with the following call:

op(0, 1, 0)

This is not correct. Since the updated value of a was stored before writing to b, there is no way to read the value b1 = 1 and then read the value a2 = 0. Without caching, the correct sequence of events leads to the following call.

op(0, 1, 1)

However, if you were to ask the question "Is there any way to allow the read of a to be cached?", the answer is yes. If you can execute all three read operations in the first thread sequence as an atomic unit, then caching the value is allowed. While synchronizing across multiple variables is difficult and rarely provides an opportunistic optimization advantage, it is certainly conceivable to encounter an exception. For example, suppose a and b are each 4 bytes, and they appear sequentially in memory with a aligned on an 8-byte boundary. A 64-bit process could implement the sequence read(a) read(b) as an atomic 64-bit load operation, which would allow the value of a to be cached (effectively treating all three read operations as an atomic operation instead of just the first two).

Tolley answered 20/12, 2014 at 0:2 Comment(12)

"The closest you could get is saying x happens-before r2, but the partial order would allow x to also occur before r1, thus collapsing the read operations" -> yes, and no: if you have 2 writes w1 and w2 and r1 is executed in between (wall clock perspective) then it must see w1 and can't see w2 yet. If r2 is executed subsequently it must see w2. So you can only collapse the two reads if you can guarantee that no write can be executed in between (which is what your last paragraph says I think). – Moonwort 20/12, 2014 at 0:28

In a typical multithreaded world such a guarantee is not possible so it would require some form of sequential processing. – Moonwort 20/12, 2014 at 0:29

Sorry, this answer is not correct. By your argument, the following optimizations would also be legal: (1) int i = a; int j = a; int k = i + j; => int k = 2*a; (2) while (a == a) { ... } => while (true) { ... }. Do you claim that these optimizations should be allowed according to the JLS? – Wystand 20/12, 2014 at 7:4

@Wystand - Yes they are allowed, at least under some circumstances. – Castrate 20/12, 2014 at 12:11

@Wystand yes that would be allowed. "It's not intuitive" is completely independent of "it's not allowed". – Tolley 20/12, 2014 at 13:19

It seems to me that what you're saying is that if a certain order of operations is possible, the compiler can assert that this order of operations is what always happens. – Wystand 21/12, 2014 at 0:37

To put it another way, you are saying that the compiler can arbitrarily resolve data races at compile time, so that the resulting binary code always chooses one path of that data race, and never the other. So continuing with the above example, "while (a == 1) { ... }" could be resolved to either "while (true) { ... }" or "while (false) { ... }" entirely at the compiler's discretion, based on the fact that some interleaving of reads could possibly yield either an infinite series of 0,0,0,0,... or 1,1,1,1,..., no matter how likely or unlikely (in this case, very unlikely). – Wystand 21/12, 2014 at 0:37

@Wystand - Again, under some circumstances yes. The JLS permits optimizations that do not give a different semantic / result for a well-formed execution. An application that has executions that are not well-formed (e.g. has data races) is liable to be optimized on the assumption that it is well-formed. This could result in one of the alternative executions being impossible. But obviously, that would be compiler specific. – Castrate 21/12, 2014 at 2:56

@Wystand - When I say the JLS permits it, it is probably better to say that the JLS does not specify what should happen. In the absence of a specified behaviour, the compiler could do anything ... including code transformations that seem unintuitive. – Castrate 21/12, 2014 at 3:0

@StephenC @Wystand Not anything either. There is the out of thin air restriction even for incorrectly synchronized programs. Also, the JLS guarantees happens-before semantics which is why I argue that the optimization is ultimatively forbidden as there is no data race in the example program which is correctly synchronized by volatile. By my understanding, both mentioned optimizations are then forbidden. – Martinamartindale 21/12, 2014 at 3:10

@RafaelWinterhalter - You are right about the "out of thin air" point. But the JLS still doesn't specify which of the set of allowed executions will actually occur. And the optimization isn't always forbidden. – Castrate 21/12, 2014 at 5:58

I asked Jeremy Manson to take a look at this question and he confirms that it's a legal optimization. – Augusto 7/1, 2015 at 10:49

In my original answer, I argued against the legality of the suggested optimization. I backed this mainly from information of the JSR-133 cookbook where it states that a volatile read must not be reordered with another volatile read and where it further states that a cached read is to be treated as a reordering. The latter statement is however formulated with some ambiguouity which is why I went through the formal definition of the JMM where I did not find such indication. Therefore, I would now argue that the optimization is allowed. However, the JMM is quite complex and the discussion on this page indicates that this corner case might be decided differently by someone with a more thorough understanding of the formalism.

Denoting thread 1 to execute

while (true) {
  System.out.println(a // r_1 
    + a); // r_2
}

and thread 2 to execute:

while (true) {
  a = 0; // w_1
  a = 1; // w_2
}

The two reads r_i and two writes w_i of a are synchronization actions as a is volatile (JSR 17.4.2). They are external actions as variable a is used in several threads. These actions are contained in the set of all actions A. There exists a total order of all synchronization actions, the synchronization order which is consistent with program order for thread 1 and thread 2 (JSR 17.4.4). From the definition of the synchronizes-with partial order, there is no edge defined for this order in the above code. As a consequence, the happens-before order only reflects the intra-thread semantics of each thread (JSR 17.4.5).

With this, we define W as a write-seen function where W(r_i) = w_2 and a value-written function V(w_i) = w_2 (JLS 17.4.6). I took some freedom and eliminated w_1 as it makes this outline of a formal proof even simpler. The question is of this proposed execution E is well-formed (JLS 17.5.7). The proposed execution E obeys intra-thread semantics, is happens-before consistent, obeys the synchronized-with order and each read observes a consistent write. Checking the causality requirements is trivial (JSR 17.4.8). I do neither see why the rules for non-terminating executions would be relevant as the loop covers the entire discussed code (JLS 17.4.9) and we do not need to distinguish observable actions.

For all this, I cannot find any indication of why this optimization would be forbidden. Nevertheless, it is not applied for volatile reads by the HotSpot VM as one can observe using -XX:+PrintAssembly. I assume that the performance benefits are however minor and this pattern is not normally observed.

Remark: After watching the Java memory model pragmatics (multiple times), I am pretty sure, this reasoning is correct.

Martinamartindale answered 21/12, 2014 at 2:42 Comment(12)

I checked with -XX:+PrintAssembly myself before posting the question. As you point out, this doesn't prove anything except possibly that even if it's allowed it's pointless. My question is whether or not it's legal. I can't see how two volatile reads, that introduce happens before relations, would be violated by being treated as one read. Could you describe an example execution in which such optimization would violate the JLS in terms of happens-before or program order? – Augusto 21/12, 2014 at 8:50

From a scenario whee thread X reads value a, thread Y writes value a and thread X reads value a, a reordering of the second read would be illegal. I think I now understand that you argue that the suggested optimization would no longer allow such an execution order by atomising the two reads of a. In my understanding, an optimization that only becomes legal after the removal of a volatile read or write and its happens-before semantic is illegal. Think of a volatile read that is a no-op which could therefore be removed altogether. Yet, this read's happens-before order remains intact. – Martinamartindale 21/12, 2014 at 11:59

The optimisation is allowed if no write can happen between the two reads, as explained in your last quote of the cookbook. So your answer is a bit contradictory. – Moonwort 21/12, 2014 at 13:22

This is not what the quote says. It is about volatiles that are only used by a single thread. – Martinamartindale 21/12, 2014 at 13:50

@RafaelWinterhalter, continuing on your example execution of X reads, Y writes, X reads, I think that unless there's a happens-before relation between Y writes and the second X reads the JVM is not required to include this execution among the set of possible executions. I would argue that without further synchronization the JVM would be allowed to always execute it in the order of X reads, X reads, Y writes. If this is the case I fail to see how it would break the spec to hardcode the assembly code to always execute that order. – Augusto 21/12, 2014 at 13:58

@assylias, I wouldn't formulate it as The optimization is allowed if no write can happen between the two reads but rather The optimization is allowed if no write must be able to happen between the to reads. (And judging from the happens-before relation defined in the JLS, I can't see how a write must be able to happen between the two reads in this scenario.) – Augusto 21/12, 2014 at 14:1

@Moonwort I see your point and I am increasingly unsure whether I am correct. I look into the formalism more and update my answer if I can deduct something more certain. – Martinamartindale 21/12, 2014 at 15:30

Your interpretation of not reordering volatile load operations was taken out of context, leading to an incorrect interpretation of the JMM. I added the section Edit 2 to my answer to provide a clarification of why this wording was used in the cookbook, along with an example showing why it doesn't apply to this question in the way you think it does. – Tolley 21/12, 2014 at 16:4

I also added a new section at the top of my answer giving a shorter, clearer description of why the read operations could be cached. – Tolley 21/12, 2014 at 16:18

@Moonwort I went through the formal specification and could not find a similar indication as I thought that I had found in the JSR-133 cookbook. – Martinamartindale 21/12, 2014 at 18:56

The cookbook is a subset of the jls: it is compliant but there are other ways to be compliant. – Moonwort 21/12, 2014 at 23:0

I meant aiiobe before. Despite the cookbook being a recommendation, it is an interpretation and thus provides good insights. – Martinamartindale 21/12, 2014 at 23:9

On one hand the very purpose of a volatile read is that it should always be fresh from memory.

That is not how the Java Language Specification defines volatile. The JLS simply says:

A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).

Therefore, a write to a volatile variable happens-before (and is visible to) any subsequent reads of that same variable.

This constraint is trivially satisfied for a read that is not subsequent. That is, volatile only ensures visibility of a write if the read is known to occur after the write.

This is not the case in your program. For every well formed execution that observes a to be 1, I can construct another well formed execution where a is observed to be 0, simply be moving the read after the write. This is possible because the happens-before relation looks as follows:

write 1   -->   read 1                    write 1   -->   read 1
   |              |                          |              |
   |              v                          v              |
   v      -->   read 1                    write 0           v
write 0           |             vs.          |      -->   read 0
   |              |                          |              |
   v              v                          v              v
write 1   -->   read 1                    write 1   -->   read 1

That is, all the JMM guarantees for your program is that a+a will yield 0, 1 or 2. That is satisfied if a+a always yields 0. Just as the operating system is permitted to execute this program on a single core, and always interrupt thread 1 before the same instruction of the loop, the JVM is permitted to reuse the value - after all, the observable behavior remains the same.

In general, moving the read across the write violates happens-before consistency, because some other synchronization action is "in the way". In the absence of such intermediary synchronization actions, a volatile read can be satisfied from a cache.

Soothe answered 25/12, 2014 at 22:32 Comment(3)

This answer makes it very clear. Here's my current thought: The obvious generalization of my question is, can a+a always be collapsed to 2*a? Here I'd say that the answer is yes, if one can guarantee that no synchronization can occur between the two reads. But I guess that to prove that, one has to rigorously read through the entire spec to exclude that possibility. Which is why it's hard to give a solid answer to the (generalized) question. Would you agree? – Augusto 25/12, 2014 at 23:52

I claim that volatile reads of the same variable can always be collapsed if there are no intermediary synchronization actions, i.e. if in the reading thread's program order, there are no synchronization actions between the two reads. This is always the case if a stands for an unqualified field access expression, but would not be the case if the a stood for an arbitrary expression, as its evaluation could involve other synchronization actions. – Soothe 26/12, 2014 at 0:41

Right. That makes a lot of sense actually. – Augusto 26/12, 2014 at 0:48

Modified the OP Problem a little

   volatile int a

    //thread 1
    while (true) {
        a = some_oddNumber;
        a = some_evenNumber;
    }

    // Thread 2 
    while (true) {
        if(isOdd(a+a)) {
            break;
        }
    }

If the above code have been executed Sequentially, then there exist a valid Sequential Consistent Execution which will break the thread2 while loop.

Whereas if compiler optimizes a+a to 2a then thread2 while loop will never exist.

So the above optimization will prohibit one particular execution if it had been Sequentially Executed Code.

Main Question, is this optimization a Problem ?

Q.   Is the Transformed code Sequentially Consistent.

Ans. A program is correctly synchronized if, when it is executed in a sequentially consistent manner, there are no data races. Refer Example 17.4.8-1 from JLS chapter 17

   Sequential consistency: the result of any execution is the same as
   if the read and write operations by all processes were executed in
   some sequential order and the operations of each individual
   process appear in this sequence in the order specified by its
   program [Lamport, 1979].

   Also see http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.3

Sequential Consistency is a strong guarantee. The Execution Path where compiler optimizes a+a as 2a is also a valid Sequentially Consistent Execution. So the Answer is Yes.

  Q.   Is the code violates happens before guarantees.

Ans. Sequential Consistency implies that happens before guarantee is valid here . So the Answer is Yes. JLS ref

So i don't think optimization is invalid legally at least in the OP case. The case where the Thread 2 while loops stucks into an infinte is also quite possible without compiler transformation.

Fettling answered 24/12, 2014 at 23:49 Comment(0)

-2

As laid out in other answers there are two reads and two writes. Imagine the following execution (T1 and T2 denote two threads), using annotations that match the JLS statement below:

T1: a = 0 //W(r)
T2: read temp1 = a //r_initial
T1: a = 1 //w
T2: read temp2 = a //r
T2: print temp1+temp2

In a concurrrent environment this is definitely a possible thread interleaving. Your question is then: would the JVM be allowed to make r observe W(r) and read 0 instead of 1?

JLS #17.4.5 states:

A set of actions A is happens-before consistent if for all reads r in A, where W(r) is the write action seen by r, it is not the case that either hb(r, W(r)) or that there exists a write w in A such that w.v = r.v and hb(W(r), w) and hb(w, r).

The optimisation you propose (temp = a; print (2 * temp);) would violate that requirement. So your optimisation can only work if there is no intervening write between r_initial and r, which can't be guaranteed in a typical multi threaded framework.

As a side comment, note however that there is no guarantee as to how long it will take for the writes to become visible from the reading thread. See for example: Detailed semantics of volatile regarding timeliness of visibility.

Steels answered 23/12, 2014 at 14:44 Comment(4)

But isn't the JVM allowed to chose other (valid) scheduling at its discretion? I think the argument here is the without further synchronization the scheduling you describe is not guaranteed. – Augusto 23/12, 2014 at 16:59

The JVM can choose a different scheduling that prevents an intervening write between the two reads, in which case the two reads can be merged into one but if it lets threads execute in parallel without further control and/or synchronization then it can't guarantee that the intervening write can't happen and therefore it can't merge the two reads without violating the JMM. The question is not whether the scheduling I used as an example is guaranteed but whether a JVM implementation allows it. – Whited 28/12, 2014 at 10:15

But wouldn't the 2*a optimization be equivalent of always choosing a scheduling without intervening write? I mean regardless if the threads are executed in parallel? – Augusto 28/12, 2014 at 10:55

Right. I too doubt that any JVMs do this. The question is however not whether any JVMs do this but rather if its legal or not. So would you agree that it's a legal optimization? – Augusto 28/12, 2014 at 19:8

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags