Java memory model - volatile and x86

Asked 27/4, 2017 at 21:31 Answered 17/5, 2020 at 12:20

Solved java multithreading cpu volatile java-memory-model

I am trying to understand the intrinsics of java volatile and its semantics, and its transaltion to the underlying architecture and its instructions. If we consider the following blogs and resourses

fences generated for volatile, What gets generated for read/write of volatile and Stack overflow question on fences

here is what I gather:

volatile read inserts loadStore/LoadLoad barriers after it (LFENCE instruction on x86)
It prevents the reordering of loads with subsequent writes/loads
It is supposed to guarantee loading of the global state that was modified by other threads i.e. after LFENCE the state modifications done by other threads are visible to the current thread on its CPU.

WHat I am struggling to understand is this: Java does not emit LFENCE on x86 i.e. read of volatile does not cause LFENCE.... I know that memory ordering of x86 prevent reording of loads with lods/stored, so second bullet point is taken care of. However, I would assume that in order for the state to be visible by this thread, LFENCE instruction should be issued to guarantee that all LOAD buffers are drained before the next instruction after the fence is executed (as per Intel manual). I understand there is cahce coherence protocol on x86, but volatile read should still drain any LOADs in the buffers, no?

Merridie answered 27/4, 2017 at 21:31 Comment(3)

It seems like you're forgetting that Java is platform-independent. – Dicta 27/4, 2017 at 21:34

@JacobG.: Platform-independent or not, Java implementations aren't platform-independent, and we can still ask questions about how Java implementations behave. – Brazil 27/4, 2017 at 21:38

An LFENCE is only useful for weakly ordered loads. A regular load on the X86 is not weakly ordered, and hence an LFENCE provides no added value. – Computerize 27/11, 2022 at 13:10

On x86, the buffers are pinned to the cache line. If the cache line is lost, the value in the buffer isn't used. So there's no need to fence or drain the buffers; the value they contain must be current because another core can't modify the data without first invalidating the cache line.

Merilee answered 27/4, 2017 at 23:1 Comment(6)

Sorry, are you saying that I do not need lfence, because when the actual load happens, and cache line is invalid, those load buffers will be doscarded? I.e. the lfence happens kind of lazily? – Merridie 27/4, 2017 at 23:7

Yes, exactly. The buffer is pinned to the cache line. If the cache line is invalidated, so is the cached read. So while x86 does prefetch reads, it cannot ever use an obsolete value. (This is an x86 thing only, of course.) – Merilee 27/4, 2017 at 23:16

What about the other variables which might be in registers? I think there needs to be some regsters unloading also, no? – Merridie 2/5, 2017 at 18:51

@Merridie The JVM won't store volatiles in registers. – Merilee 2/5, 2017 at 19:10

No no, not the volatiles - We have the volatile x, and other variables which are changed before write to x. However, changes to ALL variables are visible after reading value of x. So those variables, which could have been in registers, clearly need to flushed out as well – Merridie 3/5, 2017 at 21:11

@Merridie Yeah, you're right. I keep forgetting that Java made that (somewhat absurd, IMO) implementation change. Any variable another thread that can access that volatile might see has to be synchronized somehow. – Merilee 3/5, 2017 at 22:37

The X86 provides TSO. So, on a hardware level, the following barriers you get for free [LoadLoad][LoadStore][StoreStore]. The only one missing is the [StoreLoad].

A load has acquire semantics

r1=X
[LoadLoad]
[LoadStore]

A store has release semantics

[LoadStore]
[StoreStore]
Y=r2

If you would do a store followed by a load you end up with this:

[LoadStore]
[StoreStore]
Y=r2
r1=X
[LoadLoad]
[LoadStore]

The issue is that the load and store can still be reordered and hence it isn't sequential consistent; and this is mandatory for the Java Memory model. They only way to prevent this is with a [StoreLoad].

[LoadStore]
[StoreStore]
Y=r2
[StoreLoad]
r1=X
[LoadLoad]
[LoadStore]

And the most logical place would be to add it to the write since normally reads are more frequent than writes. So the write would become:

[LoadStore]
[StoreStore]
Y=r2
[StoreLoad]

Because the X86 provides TSO, the following fences can be no-ops:

[LoadLoad][LoadStore][StoreStore]

So the only one relevant is the [StoreLoad] and this can be accomplished by an MFENCE or a lock addl %(RSP),0

The LFENCE and the SFENCE are not relevant for this situation. The LFENCE and SFENCE are for weakly ordered loads and stores (e.g. those of SSE).

What the [StoreLoad] does on the X86 is to stop executing loads, till the store buffer has been drained. This will make sure that the load is globally visible (so read from memory/cache) AFTER the store has become globally visible (has left the store buffer and entered the L1d).

Computerize answered 17/5, 2020 at 12:20 Comment(0)

Recommended topics

Hot tags