Memory effects of synchronization in Java
Asked Answered
D

6

45

JSR-133 FAQ says:

But there is more to synchronization than mutual exclusion. Synchronization ensures that memory writes by a thread before or during a synchronized block are made visible in a predictable manner to other threads which synchronize on the same monitor. After we exit a synchronized block, we release the monitor, which has the effect of flushing the cache to main memory, so that writes made by this thread can be visible to other threads. Before we can enter a synchronized block, we acquire the monitor, which has the effect of invalidating the local processor cache so that variables will be reloaded from main memory. We will then be able to see all of the writes made visible by the previous release.

I also remember reading that on modern Sun VMs uncontended synchronizations are cheap. I am a little confused by this claim. Consider code like:

class Foo {
    int x = 1;
    int y = 1;
    ..
    synchronized (aLock) {
        x = x + 1;
    }
}

Updates to x need the synchronization, but does the acquisition of the lock clear the value of y also from the cache? I can't imagine that to be the case, because if it were true, techniques like lock striping might not help. Alternatively can the JVM reliably analyze the code to ensure that y is not modified in another synchronized block using the same lock and hence not dump the value of y in cache when entering the synchronized block?

Durston answered 4/12, 2009 at 23:10 Comment(1)
I recently came across the article CPU Cache Flushing Fallacy, which was useful in understanding this better.Durston
R
45

The short answer is that JSR-133 goes too far in its explanation. This isn't a serious issue because JSR-133 is a non-normative document which isn't part of the language or JVM standards. Rather, it is only a document which explains one possible strategy that is sufficient for implementing the memory model, but isn't in general necessary. On top of that, the comment about "cache flushing" is basically totally out place since essentially zero architectures would implement the Java memory model by doing any type of "cache flushing" (and many architectures don't even have such instructions).

The Java memory model is formally defined in terms of things like visibility, atomicity, happens-before relationships and so on, which explains exactly what threads must see what, what actions must occur before other actions and other relationships using a precisely (mathematically) defined model. Behavior which isn't formally defined could be random, or well-defined in practice on some hardware and JVM implementation - but of course you should never rely on this, as it might change in the future, and you could never really be sure that it was well-defined in the first place unless you wrote the JVM and were well-aware of the hardware semantics.

So the text that you quoted is not formally describing what Java guarantees, but rather is describing how some hypothetical architecture which had very weak memory ordering and visibility guarantees could satisfy the Java memory model requirements using cache flushing. Any actual discussion of cache flushing, main memory and so on is clearly not generally applicable to Java as these concepts don't exist in the abstract language and memory model spec.

In practice, the guarantees offered by the memory model are much weaker than a full flush - having every atomic, concurrency-related or lock operation flush the entire cache would be prohibitively expensive - and this is almost never done in practice. Rather, special atomic CPU operations are used, sometimes in combination with memory barrier instructions, which help ensure memory visibility and ordering. So the apparent inconsistency between cheap uncontended synchronization and "fully flushing the cache" is resolved by noting that the first is true and the second is not - no full flush is required by the Java memory model (and no flush occurs in practice).

If the formal memory model is a bit too heavy to digest (you wouldn't be alone), you can also dive deeper into this topic by taking a look at Doug Lea's cookbook, which is in fact linked in the JSR-133 FAQ, but comes at the issue from a concrete hardware perspective, since it is intended for compiler writers. There, they talk about exactly what barriers are needed for particular operations, including synchronization - and the barriers discussed there can pretty easily be mapped to actual hardware. Much of the actual mapping is discussed right in the cookbook.

Runnymede answered 7/12, 2009 at 23:24 Comment(0)
A
9

BeeOnRope is right, the text you quote delves more into typical implementation details than into what the Java Memory Model does indeed guarantee. In practice, you may often see that y is actually purged from CPU caches when you synchronize on x (also, if x in your example were a volatile variable in which case explicit synchronization is not necessary to trigger the effect). This is because on most CPUs (note that this is a hardware effect, not something the JMM describes), the cache works on units called cache lines, which are usually longer than a machine word (for example 64 bytes wide). Since only complete lines can be loaded or invalidated in the cache, there are good chances that x and y will fall into the same line and that flushing one of them will also flush the other one.

It is possible to write a benchmark which shows this effect. Make a class with just two volatile int fields and let two threads perform some operations (e.g. incrementing in a long loop), one on one of the fields and one on the another. Time the operation. Then, insert 16 int fields in between the two original fields and repeat the test (16*4=64). Note that an array is just a reference so an array of 16 elements won't do the trick. You may see a significant improvement in performance because operations on one field will not influence the other one any more. Whether this works for you will depend on the JVM implementation and processor architecture. I have seen this in practice on Sun JVM and a typical x64 laptop, the difference in performance was by a factor of several times.

Acnode answered 22/2, 2012 at 20:45 Comment(4)
The Disruptor framework (code.google.com/p/disruptor) makes use of this alignment trick in practice.Aconite
If possible can you please elaborate the benchmark a little? Preferably with a code.Accompanist
I saw the benchmark during a live presentation, I can't find any slides for it unfortunately. But the idea is: due to the way data is aligned in physical memory, and the way CPU cache is organized, the memory visibility effects and their performance costs may as a side effect influence not only the field marked as volatile but also neighboring fields. Of course this is an implementation-dependent side effect so do not rely on it. I would expect disruptor depends only on "visibility piggybacking" which is fully legal according to JLS and not hardware-dependent, but I'm not 100% sure.Bantling
Basically what happens is that the concurrency protocols (MESI and friends) operate at the cache line level, which is 64 bytes on x86 architectures. It two threads are writing to the same cache line concurrently from different CPUs, the cache lines thrash back and forth between the cores. This consumes lots of memory and snoop bandwidth, and causes various stalls as the CPUs wait for the updated cache line which is ping-ponging back and forth. This happens with or without volatile since it is writes that cause the issue independently of whether barriers/atomic ops are used.Runnymede
P
6

Updates to x need the synchronization, but does the acquisition of the lock clear the value of y also from the cache? I can't imagine that to be the case, because if it were true, techniques like lock striping might not help.

I'm not sure, but I think the answer may be "yes". Consider this:

class Foo {
    int x = 1;
    int y = 1;
    ..
    void bar() {
        synchronized (aLock) {
            x = x + 1;
        }
        y = y + 1;
    }
}

Now this code is unsafe, depending on what happens im the rest of the program. However, I think that the memory model means that the value of y seen by bar should not be older than the "real" value at the time of acquisition of the lock. That would imply the cache must be invalidated for y as well as x.

Also can the JVM reliably analyze the code to ensure that y is not modified in another synchronized block using the same lock?

If the lock is this, this analysis looks like it would be feasible as a global optimization once all classes have been preloaded. (I'm not saying that it would be easy, or worthwhile ...)

In more general cases, the problem of proving that a given lock is only ever used in connection with a given "owning" instance is probably intractable.

Palaeontography answered 5/12, 2009 at 0:14 Comment(0)
V
3

we are java developers, we only know virtual machines, not real machines!

let me theorize what is happening - but I must say I don't know what I'm talking about.

say thread A is running on CPU A with cache A, thread B is running on CPU B with cache B,

  1. thread A reads y; CPU A fetches y from main memory, and saved the value in cache A.

  2. thread B assigns new value to 'y'. VM doesn't have to update the main memory at this point; as far as thread B is concerned, it can be reading/writing on a local image of 'y'; maybe the 'y' is nothing but a cpu register.

  3. thread B exits a sync block and releases a monitor. (when and where it entered the block doesn't matter). thread B has updated quite some variables till this point, including 'y'. All those updates must be written to main memory now.

  4. CPU B writes the new y value to place 'y' in main memory. (I imagine that) almost INSTANTLY, information 'main y is updated' is wired to cache A, and cache A invalidate its own copy of y. That must have happened really FAST on the hardware.

  5. thread A acquires a monitor and enters a sync block - at this point it doesn't have to do anything regarding cache A. 'y' has already gone from cache A. when thread A reads y again, it's fresh from main memory with the new value assigned by B.

consider another variable z, which was also cached by A in step(1), but it's not updated by thread B in step(2). it can survive in cache A all the way to step(5). access to 'z' is not slowed down because of synchronization.

if the above statements make sense, then indeed the cost isn't very high.


addition to step(5): thread A may have its own cache which is even faster than cache A - it can use a register for variable 'y' for example. that will not be invalidated by step(4), therefore in step(5), thread A must erase its own cache upon sync entering. that's not a huge penalty though.

Viceregal answered 5/12, 2009 at 3:56 Comment(3)
Just a note from my understanding... in (3) you say "All those updates must be written to main memory now." I think this is not true from my reading of the issues with double-checked locking. The start of a synchronization block guarantees that you see the latest data but there is no explicit flush at the end of a block. Between that and the fact that statements can be reordered, you cannot expect a consistent view of data unless you are also in a synchronized block. The volatile keyword changes some of these semantics but mostly guarantees ordering and flushing.Lisandra
updates must be flushed to main memory at end of a sync block, so the next sync block started by another thread can see those updates. reordering happens, but it can't violate "happens-before" relations.Viceregal
Not technically correct, irreuptable... the data doesn't need to be flushed at the end. A 'flush' could perfectly legally happen right before the next thread takes the lock, it doesn't need to occur right after (or before) the updating thread releases it to meet the memory model.Hexagram
A
3

you might want to check jdk6.0 documentation http://java.sun.com/javase/6/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility

Memory Consistency Properties Chapter 17 of the Java Language Specification defines the happens-before relation on memory operations such as reads and writes of shared variables. The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular:

  • Each action in a thread happens-before every action in that thread that comes later in the program's order.
  • An unlock (synchronized block or method exit) of a monitor happens-before every subsequent lock (synchronized block or method entry) of that same monitor. And because the happens-before relation is transitive, all actions of a thread prior to unlocking happen-before all actions subsequent to any thread locking that monitor.
  • A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar memory consistency effects as entering and exiting monitors, but do not entail mutual exclusion locking.
  • A call to start on a thread happens-before any action in the started thread.
  • All actions in a thread happen-before any other thread successfully returns from a join on that thread

So,as stated in highlighted point above:All the changes that happens before a unlock happens on a monitor is visible to all those threads(and in there own synchronization block) which take lock on the same monitor.This is in accordance with Java's happens-before semantics. Therefore,all changes made to y would also be flushed to main memory when some other thread acquires the monitor on 'aLock'.

Alopecia answered 27/2, 2013 at 22:48 Comment(0)
A
1

synchronize guarantees, that only one thread can enter a block of code. But it doesn't guarantee, that variables modifications done within synchronized section will be visible to other threads. Only the threads that enters the synchronized block is guaranteed to see the changes. Memory effects of synchronization in Java could be compared with the problem of Double-Checked Locking with respect to c++ and Java Double-Checked Locking is widely cited and used as an efficient method for implementing lazy initialization in a multi-threaded environment. Unfortunately, it will not work reliably in a platform independent way when implemented in Java, without additional synchronization. When implemented in other languages, such as C++, it depends on the memory model of the processor, the re-orderings performed by the compiler and the interaction between the compiler and the synchronization library. Since none of these are specified in a language such as C++, little can be said about the situations in which it will work. Explicit memory barriers can be used to make it work in C++, but these barriers are not available in Java.

Archaeopteryx answered 18/8, 2010 at 20:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.