Why do I need a memory barrier?

Asked 16/8, 2010 at 14:10 Answered 27/2, 2017 at 9:14

Solved c#multithreading thread-safety shared-memory memory-barriers

C# 4 in a Nutshell (highly recommended btw) uses the following code to demonstrate the concept of MemoryBarrier (assuming A and B were run on different threads):

class Foo{
  int _answer;
  bool complete;
  void A(){
    _answer = 123;
    Thread.MemoryBarrier(); // Barrier 1
    _complete = true;
    Thread.MemoryBarrier(); // Barrier 2
  }
  void B(){
    Thread.MemoryBarrier(); // Barrier 3;
    if(_complete){
      Thread.MemoryBarrier(); // Barrier 4;
      Console.WriteLine(_answer);
    }
  }
}

they mention that Barriers 1 & 4 prevent this example from writing 0 and Barriers 2 & 3 provide a freshness guarantee: they ensure that if B ran after A, reading _complete would evaluate to true.

I'm not really getting it. I think I understand why Barriers 1 & 4 are necessary: we don't want the write to _answer to be optimized and placed after the write to _complete (Barrier 1) and we need to make sure that _answer is not cached (Barrier 4). I also think I understand why Barrier 3 is necessary: if A ran until just after writing _complete = true, B would still need to refresh _complete to read the right value.

I don't understand though why we need Barrier 2! Part of me says that it's because perhaps Thread 2 (running B) already ran until (but not including) if(_complete) and so we need to insure that _complete is refreshed.

However, I don't see how this helps. Isn't it still possible that _complete will be set to true in A but yet the B method will see a cached (false) version of _complete? Ie, if Thread 2 ran method B until after the first MemoryBarrier and then Thread 1 ran method A until _complete = true but no further, and then Thread 1 resumed and tested if(_complete) -- could that if not result in false?

Isacco answered 16/8, 2010 at 14:10 Comment(11)

Why would anyone use this over volatile? – Planish 16/8, 2010 at 14:16

@Chaos: CLR via C# book (Richter) has a great explanation - IIRC it's that 'volatile' means all accesses to the var are treated as volatile and enforce full memory barriers in both directions. That's often way more perf hit than necessary if you instead only need a read or a write barrier and only in particular accesses. – Highfalutin 16/8, 2010 at 14:20

@Chaos: not really the point, but one reason is that volatile has its own quirks in regards to compiler optimizations that might lead to deadlock, see bluebytesoftware.com/blog/2009/02/24/… – Isacco 16/8, 2010 at 14:21

@statichippo: seriously, if you're dealing with this kind of code (more than just learning about it), please get Richter's book, I can't recommend it enough. amazon.com/CLR-via-Dev-Pro-Jeffrey-Richter/dp/0735627045 – Highfalutin 16/8, 2010 at 14:21

I'm not on a regular basis, but definitely will get the book! – Isacco 16/8, 2010 at 14:27

@James - Makes sense, mainly I have been experimenting with immutability with some light locking when the need arises. Can you think of a quick example where this would be useful? – Planish 16/8, 2010 at 14:28

@James: the volatile keyword enforces "half" barriers (load-acquire + store-release) - not full barriers. If you're quoting Richter, then he's wrong on this point. There's a good explanation in Joe Duffy's "Concurrent Programming in Windows". – Ovipositor 17/8, 2010 at 1:47

@albahari - I'm sure it's me misremembering that chunk of Richter's book - admittedly it's something I haven't had to deal with very much in my own coding. – Highfalutin 17/8, 2010 at 2:27

I'm begining to wonder if anyone ever wrote a peice of code that required MemoryBarriers that didn't have a bug in. – Independence 3/11, 2011 at 16:5

Regarding volatile, it's best understood if you think of it as enforcing no memory barriers, but instead preventing compiler and CPU optimisations that could cause unexpected behaviour due to the likes of variable hoisting, To think of it otherwise is to play with fire. – Allophane 27/2, 2017 at 6:58

This topic is mentioned hardly anywhere, I found that by pure random. I wonder who plasters his code with barriers like shown, who understands this whole thing enough to prevent every case? Who can read this code effectively? I'd bet most applications are unsecured all over. Wow ... this is frightening. – Accord 14/3, 2018 at 7:39

Barrier #2 guarentees that the write to _complete gets committed immediately. Otherwise it could remain in a queued state meaning that the read of _complete in B would not see the change caused by A even though B effectively used a volatile read.

Of course, this example does not quite do justice to the problem because A does nothing more after writing to _complete which means that the write will be comitted immediately anyway since the thread terminates early.

The answer to your question of whether the if could still evaluate to false is yes for exactly the reasons you stated. But, notice what the author says regarding this point.

Barriers 1 and 4 prevent this example from writing “0”. Barriers 2 and 3 provide a freshness guarantee: they ensure that if B ran after A, reading _complete would evaluate to true.

The emphasis on "if B ran after A" is mine. It certainly could be the case that the two threads interleave. But, the author was ignoring this scenario presumably to make his point regarding how Thread.MemoryBarrier works simpler.

By the way, I had a hard time contriving an example on my machine where barriers #1 and #2 would have altered the behavior of the program. This is because the memory model regarding writes was strong in my environment. Perhaps, if I had a multiprocessor machine, was using Mono, or had some other different setup I could have demonstrated it. Of course, it was easy to demonstrate that removing barriers #3 and #4 had an impact.

Ciaracibber answered 16/8, 2010 at 16:41 Comment(6)

Thank you, that was helpful. I guess I wasn't as clueless as I thought. – Isacco 16/8, 2010 at 16:46

I don't understand both barrier 2 and 3 are needed in the case that B runs after A. Both are full fences, so any one of them would do alone, would it not ? – Zetes 5/7, 2011 at 12:15

@ohadsc: Memory barriers influence the behavior of a single thread only. Consider that A and B may be running on different CPUs. If you removed barrier 2 then the write might not be commited. If you removed barrier 3 then the read might not be refreshed. The barriers in A have no impact on the execution of B and vice versa. – Ciaracibber 5/7, 2011 at 13:21

Thanks, I understand now. If you get the time, please see my question regarding your answer here: #6574889 – Zetes 5/7, 2011 at 13:28

I don't understand memory barrier#4(is it necessary?). The #3 already makes sure that we "invalidate" memory cache and have up to date values. And _answer is guaranteed to have value first. What am I missing? – Scute 16/4, 2013 at 17:40

@Erti-ChrisEelmaa: Barrier #4 prevents _answer from being read before _complete which could result in the program printing 0 if A and B are interleaving. – Ciaracibber 8/9, 2013 at 1:1

The example is unclear for two reasons:

It is too simple to fully show what's happening with the fences.
Albahari is including requirements for non-x86 architectures. See MSDN: "MemoryBarrier is required only on multiprocessor systems with weak memory ordering (for example, a system employing multiple Intel Itanium processors [which Microsoft no longer supports]).".

If you consider the following, it becomes clearer:

A memory barrier (full barriers here - .Net doesn't provide a half barrier) prevents read / write instructions from jumping the fence (due to various optimisations). This guarantees us the code after the fence will execute after the code before the fence.
"This serializing operation guarantees that every load and store instruction that precedes in program order the MFENCE instruction is globally visible before any load or store instruction that follows the MFENCE instruction is globally visible." See here.
x86 CPUs have a strong memory model and guarantee writes appear consistent to all threads / cores (therefore barriers #2 & #3 are unneeded on x86). But, we are not guaranteed that reads and writes will remain in coded sequence, hence the need for barriers #1 and #4.
Memory barriers are inefficient and needn't be used (see the same MSDN article). I personally use Interlocked and volatile (make sure you know how to use it correctly!!), which work efficiently and are easy to understand.

Ps. This article explains the inner workings of x86 nicely.

Allophane answered 27/2, 2017 at 9:14 Comment(0)

Recommended topics

Hot tags