What are memory fences used for in Java?

Asked 7/2, 2020 at 18:30 Answered 8/2, 2020 at 20:29

java concurrency memory-barriers memory-model java-memory-model

Whilst trying to understand how SubmissionPublisher (source code in OpenJDK 10, Javadoc), a new class added to the Java SE in version 9, has been implemented, I stumbled across a few API calls to VarHandle I wasn't previously aware of:

fullFence, acquireFence, releaseFence, loadLoadFence and storeStoreFence.

After doing some research, especially regarding the concept of memory barriers/fences (I have heard of them previously, yes; but never used them, thus was quite unfamiliar with their semantics), I think I have a basic understanding of what they are for. Nonetheless, as my questions might arise from a misconception, I want to ensure that I got it right in the first place:

Memory barriers are reordering constraints regarding reading and writing operations.
Memory barriers can be categorized into two main categories: unidirectional and bidirectional memory barriers, depending on whether they set constraints on either reads or writes or both.
C++ supports a variety of memory barriers, however, these do not match up with those provided by VarHandle. However, some of the memory barriers available in VarHandle provide ordering effects that are compatible to their corresponding C++ memory barriers.

#fullFence is compatible to atomic_thread_fence(memory_order_seq_cst)
#acquireFence is compatible to atomic_thread_fence(memory_order_acquire)
#releaseFence is compatible to atomic_thread_fence(memory_order_release)
#loadLoadFence and #storeStoreFence have no compatible C++ counter part

The word compatible seems to really important here since the semantics clearly differ when it comes to the details. For instance, all C++ barriers are bidirectional, whereas Java's barriers aren't (necessarily).

Most memory barriers also have synchronization effects. Those especially depend upon the used barrier type and previously-executed barrier instructions in other threads. As the full implications a barrier instruction has is hardware-specific, I'll stick with the higher-level (C++) barriers. In C++, for instance, changes made prior to a release barrier instruction are visible to a thread executing an acquire barrier instruction.

Are my assumptions correct? If so, my resulting questions are:

Do the memory barriers available in VarHandle cause any kind of memory synchronization?
Regardless of whether they cause memory synchronization or not, what may reordering constraints be useful for in Java? The Java Memory Model already gives some very strong guarantees regarding ordering when volatile fields, locks or VarHandle operations like #compareAndSet are involved.

In case you're looking for an example: The aforementioned BufferedSubscription, an inner class of SubmissionPublisher (source linked above), established a full fence in line 1079, function growAndAdd. However, it is unclear for me what it is there for.

Archerfish answered 7/2, 2020 at 18:30 Comment(1)

I've tried to answer, but to put it very simple, they exist because people want a weaker mode than what Java has. In ascending order, these would be: plain -> opaque -> release/acquire -> volatile (sequential consistency). – Ambi 8/2, 2020 at 20:32

This is mainly a non-answer, really (initially wanted to make it a comment, but as you can see, it's far too long). It's just that I questioned this myself a lot, did a lot of reading and research and at this point in time I can safely say: this is complicated. I even wrote multiple tests with jcstress to figure out how really they work (while looking at the assembly code generated) and while some of them somehow made sense, the subject in general is by no means easy.

The very first thing you need to understand:

The Java Language Specification (JLS) does not mention barriers, anywhere. This, for java, would be an implementation detail: it really acts in terms of happens before semantics. To be able to proper specify these according to the JMM (Java Memory Model), the JMM would have to change quite a lot.

This is work in progress.

Second, if you really want to scratch the surface here, this is the very first thing to watch. The talk is incredible. My favorite part is when Herb Sutter raises his 5 fingers and says, "This is how many people can really and correctly work with these." That should give you a hint of the complexity involved. Nevertheless, there are some trivial examples that are easy to grasp (like a counter updated by multiple threads that does not care about other memory guarantees, but only cares that it is itself incremented correctly).

Another example is when (in java) you want a volatile flag to control threads to stop/start. You know, the classical:

volatile boolean stop = false; // on thread writes, one thread reads this

If you work with java, you would know that without volatile this code is broken (you can read why double check locking is broken without it for example). But do you also know that for some people that write high performance code this is too much? volatile read/write also guarantees sequential consistency - that has some strong guarantees and some people want a weaker version of this.

A thread safe flag, but not volatile? Yes, exactly: VarHandle::set/getOpaque.

And you would question why someone might need that for example? Not everyone is interested with all the changes that are piggy-backed by a volatile.

Let's see how we will achieve this in java. First of all, such exotic things already existed in the API: AtomicInteger::lazySet. This is unspecified in the Java Memory Model and has no clear definition; still people used it (LMAX, afaik or this for more reading). IMHO, AtomicInteger::lazySet is VarHandle::releaseFence (or VarHandle::storeStoreFence).

Let's try to answer why someone needs these?

JMM has basically two ways to access a field: plain and volatile (which guarantees sequential consistency). All these methods that you mention are there to bring something in-between these two - release/acquire semantics; there are cases, I guess, where people actually need this.

An even more relaxation from release/acquire would be opaque, which I am still trying to fully understand.

Thus bottom line (your understanding is fairly correct, btw): if you plan to use this in java - they have no specification at the moment, do it on you own risk. If you do want to understand them, their C++ equivalent modes are the place to start.

Ambi answered 8/2, 2020 at 20:29 Comment(24)

Don’t try to figure out the meaning of lazySet by linking to ancient answers, the current documentation precisely says what it means, nowadays. Further, it’s misleading to say that the JMM has only two access modes. We have volatile read and volatile write, which together can establish a happens-before relationship. – Indoctrinate 18/2, 2020 at 11:47

@Indoctrinate I doubt many people actually understand what that setRelease from the documentation is supposed to mean, but I agree that if you even plan to use, the documentation is enough. I also agree that this is a pair (read/write of volatile), but there is nothing in between. Before these methods - you would either have "plain" or volatile access (sequential consistency), nothing in between. At least nothing in between with guarantees, I guess. – Ambi 18/2, 2020 at 11:52

I was in the middle of writing something more about it. Consider that cas is both, a read and a write, acting like a full barrier, and you may understand, why relaxing it is desired. E.g. when implementing a lock, the first action is cas(0, 1) on the lock count, but you only need acquire semantic (like volatile read), whereas the final write of 0 to unlock ought to have release semantic (like volatile write), so there’s a happens-before between unlocking and subsequent locking. Acquire/Release is even weaker than Volatile Read/Write regarding threads using different locks. – Indoctrinate 18/2, 2020 at 12:13

General comment: Be careful when looking at asm: it's hard to know which barrier effects are guaranteed by some standard and which are an implementation detail of the particular JVM (or C++ compiler for fences in that language). If an optimization still happens, then you can be sure (barring a compiler bug) that the barrier doesn't stop it, but if it does block a reordering / dead-store elimination or something, that doesn't always prove anything about the language standard. – Rebecarebecca 18/2, 2020 at 13:12

@PeterCordes right, I rarely look at assembly that much to notice this; and when I do, I only look at some examples that JVM experts post, rarely going outside. This is far too complicated (and time consuming) for me. I usually read your answers on the matter, btw. – Ambi 18/2, 2020 at 14:0

I don't really know Java, but I'm somewhat curious about language design and how languages other than C/C++ expose atomics. It sounds like Java Opaque is maybe like C++ volatile: optimizer can't "see past it" and has to load and store when the source says so? That would be equivalent to C++ memory_order_relaxed, which is like C++ volatile on real machines (that have coherent shared memory), except that C++ volatile doesn't give atomic RMW operations; v++ is a separate load,store – Rebecarebecca 18/2, 2020 at 14:8

@PeterCordes yes, opaque is just like C++ volatile; the stores and the loads happen exactly like in source code, without any optimizations; I also admit that I am preparing a question of what exactly this means for quite some time now... thank you for your comments. – Ambi 18/2, 2020 at 14:34

Wait just like C++ volatile, with officially undefined behaviour if you write from one thread and read from another? Or does Java guarantee that the underlying machine has coherent shared memory, so the no-optimization requirement on top of that gives inter-thread visibility? And Java also guarantees atomicity even for opaque int64 or whatever? C++ doesn't. Also C++ volatile is locally ordered wrt. other volatile accesses (but not plain vars); visibility order to other threads depends on the hardware machine memory model. Or is it really more like atomic with mo_relaxed? – Rebecarebecca 18/2, 2020 at 15:1

(C++ compilers can reorder mo_relaxed atomic accesses wrt. each other, and to non-atomic accesses, so that's a major difference from C++ volatile. The standard doesn't forbid optimization of atomics (e.g. collapse two back to back relaxed stores), but in practice no compilers do it because reasons you can google...) – Rebecarebecca 18/2, 2020 at 15:4

@PeterCordes AFAIK, it is more like mo_relax and it does indeed guarantee atomicity for long (we don't have int64). I guess that makes me wrong in the previous statement about volatile, for which I am sorry. volatile in java is not allowed to be re-ordered with other volatiles (that would break sequential consistency?). I know little about C++ volatile, but you made me want to read more about it now. – Ambi 18/2, 2020 at 15:13

ISO C++ volatile means don't optimize and that's all. It's still Undefined Behaviour to write in one thread and read in another; it's designed for MMIO. (You might want volatile std::atomic<int> for an MMIO register that multiple threads access). It has zero guarantee of being usable across threads. (But in practice it is, in normal implementations for types of pointer-width and smaller because normal ABIs require them to be naturally aligned, and in asm that gives atomicity. And of course normal implementations run on cache-coherent hardware.) – Rebecarebecca 18/2, 2020 at 15:23

Of course, that said, until C++11, C++ didn't even have a memory model, and rolling your own atomics out of volatile (and inline asm for compile-time / run-time memory barriers and RMW operations) was pretty much the only option. And in real life compilers do support C/C++ volatile in ways that make this usable; the Linux kernel still does it. – Rebecarebecca 18/2, 2020 at 15:25

@Indoctrinate the truth is that I have a hard time when exactly would I need release/acquire versus sequential consistency. Your example makes sense, but does that mean we can replace every cas with release/acquire? that depends on what the underlying caller needs. – Ambi 18/2, 2020 at 15:28

I know Java volatile can't reorder with other volatile accesses, it's like C++ atomic<> with the default seq_cst ordering. My question was whether Java Opaque can reorder with other Opaque accesses like C++ atomic with mo_relaxed, or whether it's ordered wrt. other Opaque accesses like C++ volatile. (I repeat my point about volatile being a terrible choice of name for Java's atomics!) – Rebecarebecca 18/2, 2020 at 15:28

re: acq/rel vs. seq_cst: to implement on a memory model like x86 (seq_cst + a store buffer), acq/rel doesn't need any barriers. SC needs to wait for the store buffer to drain (full barrier) after a store, to block StoreLoad reordering preshing.com/20120515/memory-reordering-caught-in-the-act. Some CASes need seq_cst, some use-cases only need acq_rel. Locking is a use-case that technically only needs acq/rel, so earlier and later ops can reorder into the critical section, but stuff in the critical section can't get out. preshing.com/20120913/acquire-and-release-semantics – Rebecarebecca 18/2, 2020 at 15:34

@Peter Cordes: The first C version having a volatile keyword was C99, five years after Java, but it still lacked useful semantics, even C++03 has no Memory Model. The things which C++ calls "atomic" are also much younger than Java. And the volatile keyword does not even imply atomic updates. So why should it be named such. – Indoctrinate 18/2, 2020 at 15:47

@Holger: Oh, I wasn't aware that volatile was as recent as C99, or that Java had volatile for multi-threading back in 1999. I did comment earlier that C++ didn't even have a (thread-aware) memory model until C++11 introduced that and std::atomic. (Same for C11 and stdatomic / _Atomic). I think C/C++'s naming choices make sense: volatile = don't optimize, usable for MMIO / interaction with the underlying machine when you care how something compiles. atomic<> and atomic_flag: inter-thread behaviour guaranteed by the language standard regardless of implementation details / HW. – Rebecarebecca 18/2, 2020 at 15:52

@Holger: I was saying that Java's volatile seemed a poor choice of name given the meaning it has in Java. (And I thought because C was already using it for something very different.) Hmm, javarevisited.blogspot.com/2011/06/… says that Java 5 added SC semantics to Java volatile, so maybe before that it was more like C? And instead of introducing a new name like C++ did, they just changed behaviour. – Rebecarebecca 18/2, 2020 at 15:54

Hmm, en.wikipedia.org/wiki/Volatile_(computer_programming)#In_Java says that Java always had some ordering around volatile. – Rebecarebecca 18/2, 2020 at 16:0

@Holger: en.cppreference.com/w/c/language/volatile says the new thing in C99 was being able to use void f(double x[volatile]) syntax instead of void f(double *x). Other than that, C volatile has existed since C89, and probably somewhat earlier back into K&R days. I'm pretty sure you're mistaken about volatile being new in C99 and thus post-dating Java. (But yes, Java volatile dates from original Java in 1995, back before multi-threaded programming was such a big deal. Makes sense that they'd just copy a qualifier from C/C++ (which didn't have a mem model at the time)) – Rebecarebecca 18/2, 2020 at 16:8

@PeterCordes 1) re opaque: the documentation of setOpaque says Sets the value of a variable to the newValue, in program order..., to me this means no re-orderings between opaque themselves. 2) on x86 things are somehow well understood about rel/acq and seq_const (and the fact that rel/acq is basically "free"), it's the other platforms that care about this a lot more. This is another reason why they introduced these methods. – Ambi 18/2, 2020 at 16:15

@PeterCordes perhaps, I'm confusing it with restrict, however, I remember times when I had to write __volatile to use a non-keyword compiler extension. So perhaps, it didn't implement C89 completely? Don't tell me I'm that old. Before Java 5, volatile was much closer to C. But Java had no MMIO, so its purpose always was multi-threading, but the pre-Java 5 semantic wasn't very useful for that. So release/acquire like semantics were added, but still, it's not atomic (atomic updates are an additional feature built atop it). – Indoctrinate 18/2, 2020 at 16:20

@Ambi regarding this, my example was specific for using cas for locking which would be acquire. A countdown latch would bear atomic decrements with release semantic, followed by the thread reaching zero inserting an acquire fence and executing the final action. Of course, there are other cases for atomic updates where the full fence remains required. – Indoctrinate 18/2, 2020 at 16:36

@Indoctrinate understood, in such a case this makes perfect sense. as usual from you. much appreciated. – Ambi 18/2, 2020 at 16:37

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags