How to understand JDK9 memory model?

Asked 26/1, 2021 at 6:36 Answered 27/1, 2021 at 7:52

Solved java multithreading volatile java-9 java-memory-model

I'm learning the JDK9 memory model.

After watching the speech Java Memory Model Unlearning Experience and reading the paper Using JDK 9 Memory Order Modes.

I'm confused about some concepts.

Does opaque immediately guarantee the visibility?
How to understand partial order and total order in the paper?

For the first question, the paper says

It is almost never a good idea to use bare spins waiting for values of variables. Use Thread.onSpinWait, Thread.yield, and/or blocking synchronization to better cope with the fact that "eventually" can be a long time, especially when there are more threads than cores on a system.

So if I write the code:

// shared variable I and VarHandle I_HANDLE which referred to I
public static int I = 0;

public static final VarHandle I_HANDLE;

// Thread-1
I_HANDLE.setOpaque(1);

// Thread-2
while((int) I_HANDLE.getOpaque() == 0){
}

The thread-2 eventually terminate but maybe after a long time?

If so, is there any minimal approach to guarantee thread-2 immediately see the modifying by thread-1? (Release/Acquire?volatile?)

Rizas answered 26/1, 2021 at 6:36 Comment(0)

There is no such thing like “immediately” for updates. Even electricity moves with a finite speed. Generally, asking for a perceivable effect within a particular time span is like asking for a particular execution time for an operation. Neither can be guaranteed, as they are properties of the underlying architecture which the JVM can’t change.

Practically, of course, JVM developers try to make the operations as fast as possible and all that matters to you, as a programmer, is that there is no faster alternative to opaque writes regarding inter-thread visibility of updates. The stronger access modes do not change how fast an update will become visible, they add additional constraints to reordering of reads and writes.

So in your example, the update will become visible as fast as the architecture and system load will allow¹, but don’t ask for actual numbers. No-one can say how long it will take. If you need guarantees in terms of time quantities, you need a special (“real-time”) implementation that can give you additional guarantees beyond the Java Memory Model.

¹ To name a practical scenario: thread 1 and 2 may compete for the same CPU. Thread 1 writes the value and continues to run for the operating system specific time before the task is switched (and it’s not even guaranteed that thread 2 is the next one). This implies that a rather long time may elapse, in both terms, wall clock time and thread 1’s progress after the write. Of course, other threads may also make a lot of progress on the other CPU cores in the meanwhile. But it’s also possible that thread 2’s polling before thread 1 commits the write is the cause of thread 1 not getting a chance to write the new value. That’s why you should mark such polling loops with onSpinWait or yield, to give the execution environment a chance to prevent such scenarios. See this Q&A for a discussion about the difference between the two.

Graeae answered 26/1, 2021 at 11:5 Comment(7)

I do not believe of the purpose of onSpinWait is related to fairness/starvation. The problem with spinning is that the pipeline can get filled with speculative executed loads of the same variable and these loads can be executed out of order. This becomes problem if a later load sees an older value while and an earlier load sees a later value. When this happens, pipeline flush occurs due to a memory order violation. This is because x86 guarantees that loads are not reordered as part of TSO. Apart from the price to pay at loop exiting, it also prevents the CPU from spinning like crazy. – Arin 27/1, 2021 at 12:38

@Arin I never said anything about fairness. And “spinning like crazy” is precisely what can prevent the writing task from getting the CPU. The details are less important, as long as the programmer understands that this hint allows the JVM to eliminate architecture-specific disadvantages of a polling loop. It doesn’t have to have the same effect on every architecture. And the footnote was just one example scenario. – Graeae 27/1, 2021 at 13:12

Spinning like crazy will not prevent another task from getting the CPU. And onSpinWait will not be of any benefit in that regard (it isn't a yield to the OS) as well. Apart from preventing this memory order violation it prevents the CPU from running hot and it will give some additional space for hyper siblings to run. – Arin 27/1, 2021 at 13:26

@Arin you are drawing assumptions from a particular implementation for a particular architecture. The specification says nothing about what a JVM will do, that’s entirely up to the implementors. The method is just a hint about what the code is doing. The JVM implementors can do whatever they consider useful for that scenario. – Graeae 27/1, 2021 at 13:30

Primary reason onSpinWait was introduced was X86 and the pipeline flush triggered by a memory order violation on loop exit. – Arin 27/1, 2021 at 13:33

@Arin undocumented primary reasons do not matter. The specification allows implementors to do whatever they want. This Q&A is not about the current x86 implementation but the JMM. This method is a hint, nothing more, point. Anyway, I added a link for those readers, who want to know implementation details. – Graeae 27/1, 2021 at 13:41

At this level they matter a lot. If you play at this level, you need to know the hardware you are talking about otherwise you should not touch it. Even the recent change of increasing the delay count from 10 uops to 140 uops in Skylake for the PAUSE instruction matters a lot. – Arin 27/1, 2021 at 13:42

In simple terms, opaque means that the read or write is going to happen. So it isn't optimized away by the compiler.

It doesn't provide any ordering guarantees with respect to other variables.

So it is good for example for performance counters, whereby 1 thread does an update and other threads read it.

But if you would do the following (pseudo)

// global
final IntReference a = new IntReference();
final IntReference b = new IntReference();

void thread1(){
    a.setPlain(1);
    b.setOpaque(1);
}

void thread2(){
    int r1 = b.getOpaque();
    int r2 = a.getPlain();
    if(r1 == 1 && r2 == 0) println("violation");
}

Then it could be that 'violation' got printed because:

the stores of a,b get reordered
the loads from a and b get reordered.

However if you would use an store release and load acquire, the reordering can't happen because the release and acquire provide ordering constraints with respect to other variables.

void thread1(){
    a.setPlain(1);
    [StoreStore] <--
    [LoadStore]
    b.setRelease(1);
}

void thread2(){
    int r1 = b.getAcquire();
    [LoadLoad] <---
    [LoadStore]
    int r2 = a.getPlain();
    if(r1 == 1 && r2 == 0) println("violation");
}

Arin answered 27/1, 2021 at 7:52 Comment(1)

I like this answer, a lot. What I really wish is for you to also speak more about So it is good for example for performance counters (which are called "progress indicators", afaik). It is rather nice that you also bring into the discussion release/acquire. Though, the same effect as you have shown here can be done with volatile. I do understand that release/acquire is cheaper on some platforms and it is a good example, but imho (emphasis on humble), it confuses readers of why volatile was not used there. Anyway, again, I really like it. thank you. – Garbage 27/1, 2021 at 20:41

Recommended topics

Hot tags