What does "subsequent read" mean in the context of volatile variables?

Asked 15/6, 2018 at 10:37 Answered 15/6, 2018 at 20:59

Solved java multithreading cpu-architecture

Java memory visibility documentation says that:

A write to a volatile field happens-before every subsequent read of that same field.

I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores. So for example I assign value to variable in cycle c1 in some thread and then second thread is able to see this value in subsequent cycle c1 + 1?

Trypanosome answered 15/6, 2018 at 10:37 Comment(2)

@Ben there is no such thing as clearing the caches. Caches are coherent and write-back on x86. My question was what is meaning of subsequent. Subsequent implies some order and the question is what defines order. Java specification is not clear about it. Also I'm interested how this order maps to contemporary hardware. – Trypanosome 15/6, 2018 at 11:45

But I rather deleted the comment as a lot of answers are here already and it's not helping anyone anymore :) – Acrimony 15/6, 2018 at 11:52

It sounds to me like it's saying that it provides lockless acquire/release memory-ordering semantics between threads. See Jeff Preshing's article explaining the concept (mostly for C++, but the main point of the article is language neutral, about the concept of lock-free acquire/release synchronization.)

In fact Java volatile provides sequential consistency, not just acq/rel. There's no actual locking, though. See Jeff Preshing's article for an explanation of why the naming matches what you'd do with a lock.)

If a reader sees the value you wrote, then it knows that everything in the producer thread before that write has also already happened.

This ordering guarantee is only useful in combination with other guarantees about ordering within a single thread.

e.g.

int data[100];
volatile bool data_ready = false;

Producer:

data[0..99] = stuff;
 // release store keeps previous ops above this line
data_ready = true;

Consumer:

while(!data_ready){}     // spin until we see the write
// acquire-load keeps later ops below this line
int tmp = data[99];      // gets the value from the producer

If data_ready was not volatile, reading it wouldn't establish a happens-before relationship between two threads.

You don't have to have a spinloop, you could be reading a sequence number, or an array index from a volatile int, and then reading data[i].

I don't know Java well. I think volatile actually gives you sequential-consistency, not just release/acquire. A sequential-release store isn't allowed to reorder with later loads, so on typical hardware it needs an expensive memory barrier to make sure the local core's store buffer is flushed before any later loads are allowed to execute.

Volatile Vs Atomic explains more about the ordering volatile gives you.

Java volatile is just an ordering keyword; it's not equivalent to C11 _Atomic or C++11 std::atomic<T> which also give you atomic RMW operations. In Java, volatile_var++ is not an atomic increment, it a separate load and store, like volatile_var = volatile_var + 1. In Java, you need a class like AtomicInteger to get an atomic RMW.

And note that C/C++ volatile doesn't imply atomicity or ordering at all; it only tells the compiler to assume that the value can be modified asynchronously. This is only a small part of what you need to write lockless for anything except the simplest cases.

Metrology answered 15/6, 2018 at 11:49 Comment(13)

This is the right answer - subsequent in this context means "a read that sees the new value written", so as not to invoke a global clock or anything like that. In theory, this lets a write be indefinitely delayed since there is no guarantee that any read will see the value soon or ever - but in practice on all interesting architectures the write generally becomes visible "soon" (often single-digit nanoseconds, typical worst case probably hundreds of nanoseconds). – Kiernan 15/6, 2018 at 14:51

I have a question actually. Isn't that goes down to the fact the intel's implementation of cache coherence protocol serializes the cache-line transition state? So we can rely on "what happens before which one"? No? – Rodie 16/6, 2018 at 16:14

@St.Antario: Not sure what you're saying. Having a single total order of stores that all threads agree on, for a single cache line or memory location doesn't imply release/acquire semantics. That's a property of memory ordering between stores to different cache lines, and has to be enforced separately. (And BTW, some weakly-ordered systems don't have a global store order that all threads can agree on for relaxed stores at all. But x86's TSO memory model requires one, and it has to be some interleaving of program order.) – Metrology 16/6, 2018 at 17:24

@PeterCordes Let me try to be more specific. If we consider two threads performing reading and writing to a volatile variables concurrently like this on jdk8-hotspot we will have the following compiled code at runtime. When writing to the volatile variable we have lock addl. How does this instruction invoked by some core affect another cores? How does it guarantees the happens-before ordering. – Rodie 16/6, 2018 at 18:9

@St.Antario: A core can't begin executing lock add until it has the line in E state of MESI, and it keeps the line locked for the duration so no other cores can modify it. My canonical answer on Can num++ be atomic for 'int num'? has the details. In fact no other cores can even read it, and it's a full memory barrier on the local core, which is what gives sequential-consistency and the happens-before. – Metrology 16/6, 2018 at 18:12

@PeterCordes If we remove the lock prefix from the add instruction and put LFENCE before vmovq %xmm0,%rdi ;*getstatic a do we get the same memory ordering? – Rodie 16/6, 2018 at 18:26

@St.Antario: oh, I didn't look at the asm you linked before, I assumed you were talking about an atomic increment. You're doing an atomic load, then an increment of a temporary, then an atomic store with seq-cst ordering. (With some really braindead asm, like movabs $1, %r10 instead of sub $1, %rdi, and bouncing through XMM regs for no reason). lock addl $0, (%rsp) is being used as a memory barrier because it's probably more efficient than mfence. – Metrology 16/6, 2018 at 18:34

@St.Antario: Anyway, the entire inc() function is not a single atomic operation, probably because you use a = a+1 which reads the variable in an expression and then assigns it. Does a+=1; work in Java, or does it have special RMW functions for volatile variables, like C++'s std::atomic_fetch_add(&var, 1);. (Normally you'd use member functions and overloaded operators in C++, but there are stand-alone functions). I assume this is not what you wanted. – Metrology 16/6, 2018 at 18:41

@PeterCordes Yes, simple increment like this is not atomic. Actually I was not concerned about atomicity in this case (we have AtomicLong::compareAndSet and friends methods which CASes the value). I was concerned about memory ordering and how it was affected by the lock addl (probably I provided sort of crazy example). – Rodie 16/6, 2018 at 18:44

@St.Antario: The load is already an acquire-load and doesn't need further fencing. The store is a release-store on its own (can't reorder with earlier ops, or later stores because they're also release-stores), but the full barrier makes it a sequential-release (can't reorder with later loads.) lfence before doesn't give you that. See preshing.com/20120515/memory-reordering-caught-in-the-act. (mfence is equivalent to lock addl $0, (%rsp)). Semi Related: Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?. – Metrology 16/6, 2018 at 18:56

Sorry, misclick on the downvote. Just edited your post to be able to remove. – Pyretic 18/6, 2018 at 12:40

This is pretty much correct although sticking to Java would be good as to not confuse. The only language I don't like is the acquire/release which implies locking which obviously doesn't happen with volatile. – Pyretic 18/6, 2018 at 13:55

@Gray: Thanks, I hadn't thought of that possible confusion for beginners. Acquire/Release is the standard terminology for this kind of memory-ordering semantics, but I added some words to make it clear that it's lockless memory-ordering, not acquiring a lock. (Jeff Preshing's article already fully explained that, but it's not a bad thing to have it right here in the answer.) – Metrology 18/6, 2018 at 14:13

It means that once a certain Thread writes to a volatile field, all other Thread(s) will observe (on the next read) that written value; but this does not protect you against races though.

Threads have their caches, and those caches will be invalidated and updated with that newly written value via cache coherency protocol.

EDIT

Subsequent means whenever that happens after the write itself. Since you don't know the exact cycle/timing when that will happen, you usually say when some other thread observes the write, it will observer all the actions done before that write; thus a volatile establishes the happens-before guarantees.

Sort of like in an example:

 // Actions done in Thread A
 int a = 2;
 volatile int b = 3;


 // Actions done in Thread B
 if(b == 3) { // observer the volatile write
    // Thread B is guaranteed to see a = 2 here
 }

You could also loop (spin wait) until you see 3 for example.

Meill answered 15/6, 2018 at 10:45 Comment(11)

"on the next read" - he seems to be confused as to what constitutes a "next" read and how that is determined – Karrykarst 15/6, 2018 at 10:46

@Karrykarst exactly. Eugene just changed subsequent to next. That doesn't explain what next/subsequent means for hardware or java. – Trypanosome 15/6, 2018 at 10:53

subsequent/next is anything after the cycle that wrote the value to memory basically. No guarantee is given when the cycle writing the value happens. A guarantee is given that after that the value is "set in stone" and will be read as that value, no matter any caching, etc. – Acrimony 15/6, 2018 at 10:56

@Acrimony exactly. that is the reason why usually people say - when Thread B observes the value written it will observer everything before that write – Meill 15/6, 2018 at 11:36

@Karrykarst well, to me, that is the entire point with volatiles - since you don't know the exact cycle when that will happen, you usually say when some other thread observes the write, it will observer all the actions done before that write – Meill 15/6, 2018 at 11:43

@Meill your latest comment seems to expain it the best. – Trypanosome 15/6, 2018 at 12:0

@Trypanosome even the upvoted example here https://mcmap.net/q/537389/-what-does-quot-subsequent-read-quot-mean-in-the-context-of-volatile-variables does a spin wait, so that another Thread observes that write. That is the point with volatiles to begin with - they establish the happens-before... – Meill 15/6, 2018 at 12:12

@Meill you comments are far better than your initial answer. Can you improve your answer? – Trypanosome 15/6, 2018 at 12:15

@Trypanosome well, if you like this, take a look at AtomicInteger#lazySet and the single writer principle - it's a very nice feature too – Meill 15/6, 2018 at 12:28

@Acrimony Can you expand a bit why does not caching matter? What if the core writing some value have the cache line in invalidated state and some another core owns the cache line exclusively? So before the writing cycle occurs the cache line in another core has to be invalidated first. – Rodie 16/6, 2018 at 16:34

@Pyretic I think I've edited the example a long time ago, but forgot to mention you in a comment. – Meill 13/8, 2018 at 9:26

Peter's answer gives the rationale behind the design of the Java memory model.
In this answer I'm attempting to give an explanation using only the concepts defined in the JLS.

In Java every thread is composed by a set of actions.
Some of these actions have the potential to be observable by other threads (e.g. writing a shared variable), these are called synchronization actions.

The order in which the actions of a thread are written in the source code is called the program order.
An order defines what is before and what is after (or better, not before).

Within a thread, each action has a happens-before relationship (denoted by <) with the next (in program order) action. This relationship is important, yet hard to understand, because it's very fundamental: it guarantees that if A < B then the "effects" of A are visible to B.
This is indeed what we expect when writing the code of a function.

Consider

Thread 1           Thread 2

  A0                 A'0
  A1                 A'1
  A2                 A'2
  A3                 A'3

Then by the program order we know A0 < A1 < A2 < A3 and that A'0 < A'1 < A'2 < A'3.
We don't know how to order all the actions.
It could be A0 < A'0 < A'1 < A'2 < A1 < A2 < A3 < A'3 or the sequence with the primes swapped.
However, every such sequence must have that the single actions of each thread are ordered according to the thread's program order.

The two program orders are not sufficient to order every action, they are partial orders, in opposition of the total order we are looking for.

The total order that put the actions in a row according to a measurable time (like a clock) they happened is called the execution order.
It is the order in which the actions actually happened (it is only requested that the actions appear to be happened in this order, but that's just an optimization detail).

Up until now, the actions are not ordered inter-thread (between two different threads).
The synchronization actions serve this purpose.
Each synchronization action synchronizes-with at least another synchronization action (they usually comes in pairs, like a write and a read of a volatile variable, a lock and the unlock of a mutex).

The synchronize-with relationship is the happens-before between thread (the former implies the latter), it is exposed as a different concept because 1) it slightly is 2) happens-before are enforced naturally by the hardware while synchronize-with may require software intervention.

happens-before is derived from the program order, synchronize-with from the synchronization order (denoted by <<).
The synchronization order is defined in terms of two properties: 1) it is a total order 2) it is consistent with each thread's program order.

Let's add some synchronization action to our threads:

Thread 1           Thread 2

  A0                 A'0
  S1                 A'1
  A1                 S'1
  A2                 S'2
  S2                 A'3

The program orders are trivial.
What is the synchronization order?

We are looking for something that by 1) includes all of S1, S2, S'1 and S'2 and by 2) must have S1 < S2 and S'1 < S'2.

Possible outcomes:

S1 < S2 < S'1 < S'2
S1 < S'1 < S'2 < S2
S'1 < S1 < S'2 < S'2

All are synchronization orders, there is not one synchronization order but many, the question of above is wrong, it should be "What are the synchronization orders?".

If S1 and S'1 are so that S1 << S'1 than we are restricting the possible outcomes to the ones where S1 < S'2 so the outcome S'1 < S1 < S'2 < S'2 of above is now forbidden.

If S2 << S'1 then the only possible outcome is S1 < S2 < S'1 < S'2, when there is only a single outcome I believe we have sequential consistency (the converse is not true).

Note that if A << B these doesn't mean that there is a mechanism in the code to force an execution order where A < B.
Synchronization actions are affected by the synchronization order they do not impose any materialization of it.
Some synchronization actions (e.g. locks) impose a particular execution order (and thereby a synchronization order) but some don't (e.g. reads/writes of volatiles).
It is the execution order that create the synchronization order, this is completely orthogonal to the synchronize-with relationship.

Long story short, the "subsequent" adjective refers to any synchronization order, that is any valid (according to each thread program order) order that encompasses all the synchronization actions.

The JLS then continues defining when a data race happens (when two conflicting accesses are not ordered by happens-before) and what it means to be happens-before consistent.
Those are out of scope.

Ommiad answered 15/6, 2018 at 20:36 Comment(8)

Release/acquire is not specific to x86, and neither is my answer. It's probably one of the least x86-centric answer I've written in a long time. :P But it is answering by analogy to C++ so your answer is definitely useful. – Metrology 15/6, 2018 at 20:51

@PeterCordes Oh, sorry, bad wording :) I'm fixing it. – Ommiad 16/6, 2018 at 6:29

I find this answer really difficult to follow and also misleading. You can talk about A < B < C but the compiler can reordering those statements at will for optimizations as long as it doesn't violate the language definition. That's the whole point of this. Program order may be A B C D but execution order could easily be D C B A or any other combination unless (for example) C depends on A and B where the reordering would violate the language definition. – Pyretic 17/6, 2018 at 19:5

@Gray, Here's the key to understand: This is about the language definition. This is not about what the compiler will do, A < B is a construct in the JLS, something a compiler must be compliant to. Reordering is irrelevant and it's not the answer to the OP to me. – Ommiad 17/6, 2018 at 21:27

The OP is talking about sharing data between threads. That is more about execution order than program order. Your statement "every such sequence must have that the single actions of each thread are ordered according to the thread's program order" is incorrect because once you consider multiple threads, the reordering is critical. It is completely legal for A1 to be reordered so it comes before A0. – Pyretic 17/6, 2018 at 23:22

@Pyretic I know that reordering is legal but is the ex order that is subject to the prog order of each thread (see JLS 17.4.7, which contradicts your last comment). Whatever the reorder is, it must be equivalent to the program order, so the former is just an implementation detail and we can reason only in terms of program order (and sync order for MT). All these concepts are necessary to set the bounds of a valid execution. Finally, memory reordering is just one thing, visibility being the other. – Ommiad 18/6, 2018 at 15:38

@Pyretic Anyway, I think it's pointless to argue :) I surely don't know all the nuissances of the JLS but I still think my interpretation is largerly correct, maybe too technical for the OP. – Ommiad 18/6, 2018 at 15:40

I don't think it is pointless :-). See 17.4-1 on that page for an example of reordering. 17.4.7 is saying that reordering cannot change the actions of the code but that doesn't mean there is guaranteed order at execution time. The compiler is able to do all sorts of tricks to get more speed out of the code as long as the overall effect of the code is the same. For example, a constructor can finish and return an allocated object before the field initialization is done which is why unsafe publishing of objects is such a problem. The examples are wild. – Pyretic 18/6, 2018 at 15:55

I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores...?

Subsequent means (according to the dictionary) coming after in time. There certainly is a global clock across all CPUs in a computer (think X Ghz) and the document is trying to say that if thread-1 did something at clock tick 1 then thread-2 does something on another CPU at clock tick 2, it's actions are considered subsequent.

A write to a volatile field happens-before every subsequent read of that same field.

The key phrase that could be added to this sentence to make it more clear is "in another thread". It might make more sense to understand it as:

A write to a volatile field happens-before every subsequent read of that same field in another thread.

What this is saying that if a read of a volatile field happens in Thread-2 after (in time) the write in Thread-1, then Thread-2 will be guaranteed to see the updated value. Further up in the documentation you point to is the section (emphasis mine):

... The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular.

Notice the highlighted phrase. The Java compiler is free to reorder instructions in any one thread's execution for optimization purposes as long as the reordering doesn't violate the definition of the language – this is called execution order and is critically different than program order.

Let's look at the following example with variables a and b that are non-volatile ints initialized to 0 with no synchronized clauses. What is shown is program order and the time in which the threads are encountering the lines of code.

Time     Thread-1        Thread-2
1        a = 1;          
2        b = 2;          
3                        x = a;
4                        y = b;
5        c = a + b;      z = x + y;

If Thread-1 adds a + b at Time 5, it is guaranteed to be 3. However, if Thread-2 adds x + y at Time 5, it might get 0, 1, 2, or 3 depends on race conditions. Why? Because the compiler might have reordered the instructions in Thread-1 to set a after b because of efficiency reasons. Also, Thread-1 may not have appropriately published the values of a and b so that Thread-2 might get out of date values. Even if Thread-1 gets context-switched out or crosses a write memory barrier and a and b are published, Thread-2 needs to cross a read barrier to update any cached values of a and b.

If a and b were marked as volatile then the write to a must happen-before (in terms of visibility guarantees) the subsequent read of a on line 3 and the write to b must happen-before the subsequent read of b on line 4. Both threads would get 3.

We use volatile and synchronized keywords in java to ensure happens-before guarantees. A write memory barrier is crossed when assigning a volatile or exiting a synchronized block and a read barrier is crossed when reading a volatile or entering a synchronized block. The Java compiler cannot reorder write instructions past these memory barriers so the order of updates is assured. These keywords control instruction reordering and insure proper memory synchronization.

NOTE: volatile is unnecessary in a single-threaded application because program order assures the reads and writes will be consistent. A single-threaded application might see any value of (non-volatile) a and b at times 3 and 4 but it always sees 3 at Time 5 because of language guarantees. So although use of volatile changes the reordering behavior in a single-threaded application, it is only required when you share data between threads.

Pyretic answered 15/6, 2018 at 20:59 Comment(5)

"in another thread" is the interesting part, but it's also true within a single thread. Agreed that including that phrase would make the meaning clearer. – Metrology 15/6, 2018 at 21:1

Program order assures the single thread that a + b always == 3. To not do so would violate the language rules. But I'll flesh that out a bit @PeterCordes. Thanks. – Pyretic 15/6, 2018 at 21:2

Yes, of course it's a trivial / obvious guarantee within a single thread, and should go without saying. I just meant that calling it "missing" implies the sentence isn't accurate without it. Just a phrasing issue. – Metrology 15/6, 2018 at 21:4

Ok, but what do "subsequent" mean? Note that for the x86 architecture subsequent according to a total time is not sufficient for the volatile semantics. This doesn't answer the question. – Ommiad 17/6, 2018 at 21:30

Subsequent means happening after in an execution order standpoint. If volatile int a has been assigned and then later a is read by another thread, it is guaranteed to see the appropriate value. I've added more details to my answer. – Pyretic 17/6, 2018 at 23:33

This is more a definition of what will not happen rather than what will happen.

Essentially it is saying that once a write to an atomic variable has happened there cannot be any other thread that, on reading the variable, will read a stale value.

Consider the following situation.

Thread A is continuously incrementing an atomic value a.
Thread B occasionally reads A.a and exposes that value as a non-atomic b variable.
Thread C occasionally reads both A.a and B.b.

Given that a is atomic it is possible to reason that from the point of view of C, b may occasionally be less than a but will never be greater than a.

If a was not atomic no such guarantee could be given. Under certain caching situations it would be quite possible for C to see b progress beyond a at any time.

This is a simplistic demonstration of how the Java memory model allows you to reason about what can and cannot happen in a multi-threaded environment. In real life the potential race conditions between reading and writing to data structures can be much more complex but the reasoning process is the same.

Symbol answered 15/6, 2018 at 11:13 Comment(4)

Most hardware can't do that reordering if you're describing asm store and load instructions, rather than Java assignment operations. PowerPC can in practice: a thread may see a store from another thread before it becomes globally visible to all threads. (I wrote a hardware answer about it on a C++ question: Will two atomic writes to different locations in different threads always be seen in the same order by other threads?) – Metrology 15/6, 2018 at 11:38

You didn't say in what order thread C reads a and b but regardless per the Java memory model C could certainly see a larger value for b than for a since there is no happens-before relationship between the write of b and its read: you have a data race. – Kiernan 15/6, 2018 at 14:54

@Kiernan - I've adjusted the wording to clarify (I hope). – Symbol 15/6, 2018 at 15:13

Well it's still not very clear (you don't mention in what order C reads a and b which could be very important) and you use atomic which isn't really a keyword in Java (maybe you are thinking of C++ std::atomic?), but I'll assume you are talking about volatile when you say atomic. Still, the overall claim is wrong as far as I can tell. Even with volatile a you can't really reason about any relationship between a and b since b is written after a on thread B so there is no happens-before chain involving b. b could have any value ever written, including > a. – Kiernan 15/6, 2018 at 17:33

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags