Where to places fences/memory barriers to guarantee a fresh read/committed writes?

Asked 8/2, 2014 at 23:10 Answered 4/8, 2014 at 1:24

Solved c#.net multithreading volatile memory-fences

Like many other people, I've always been confused by volatile reads/writes and fences. So now I'm trying to fully understand what these do.

So, a volatile read is supposed to (1) exhibit acquire-semantics and (2) guarantee that the value read is fresh, i.e., it is not a cached value. Let's focus on (2).

Now, I've read that, if you want to perform a volatile read, you should introduce an acquire fence (or a full fence) after the read, like this:

int local = shared;
Thread.MemoryBarrier();

How exactly does this prevent the read operation from using a previously cached value? According to the definition of a fence (no read/stores are allowed to be moved above/below the fence), I would insert the fence before the read, preventing the read from crossing the fence and being moved backwards in time (aka, being cached).

How does preventing the read from being moved forwards in time (or subsequent instructions from being moved backwards in time) guarantee a volatile (fresh) read? How does it help?

Similarly, I believe that a volatile write should introduce a fence after the write operation, preventing the processor from moving the write forward in time (aka, delaying the write). I believe this would make the processor flush the write to the main memory.

But to my surprise, the C# implementation introduces the fence before the write!

[MethodImplAttribute(MethodImplOptions.NoInlining)] // disable optimizations
public static void VolatileWrite(ref int address, int value)
{
    MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
    address = value;
}

Update

According to this example, apparently taken from "C# 4 in a Nutshell", fence 2 , placed after a write is supposed to force the write to be flushed to main memory immediately, and fence 3, placed before a read, is supposed to guarantee a fresh read:

class Foo{
  int _answer;
  bool complete;
  void A(){
    _answer = 123;
    Thread.MemoryBarrier(); // Barrier 1
    _complete = true;
    Thread.MemoryBarrier(); // Barrier 2
  }
  void B(){
    Thread.MemoryBarrier(); // Barrier 3;
    if(_complete){
      Thread.MemoryBarrier(); // Barrier 4;
      Console.WriteLine(_answer);
    }
  }
}

The ideas in this book (and my own personal beliefs) seem to contradict the ideas behind C#'s VolatileRead and VolatileWrite implementations.

Limner answered 8/2, 2014 at 23:10 Comment(17)

Where did you read that? – Queenstown 8/2, 2014 at 23:46

@ThomasLevesque near the end of the first answer to this question : #10590154 – Limner 8/2, 2014 at 23:49

The purpose of the fence is not to prevent the read from being cached. It is to prevent later reads from moving backward. – Tab 9/2, 2014 at 15:40

@RaymondChen so are you then saying that the barriers in the VolatileRead/VolatileWrite implementations do not guarantee that the latest value will be read/the write will be immediately seen by other threads? – Limner 9/2, 2014 at 16:33

Right. They are for ordering, not immediacy. Note that if you require a read to have the latest value, then you probably already have a race condition in your code. Imagine the CPU issuing the read executed one cycle earlier than normal. Then it reads the old value not because of a stale cache but because the write hasn't happened yet. – Tab 9/2, 2014 at 17:16

Thanks @RaymondChen. I'm aware of these issues, I'm just trying to make some sense out of what I've been reading, I'm also trying to figure out the exact benefits of a fence and when/where to place them. One more question: if you say that these fences do not guarantee immediacy - are you also saying that these methods' documentation is wrong? VolatileRead: "The value is the latest written by any processor in a computer, regardless of the number of processors or the state of processor cache." – Limner 9/2, 2014 at 17:26

You will have yo read the processor manuals to see what immediacy each processor guarantees. Alpha AXP for example has a very weak model. – Tab 9/2, 2014 at 17:39

@RaymondChen So you are saying that some processors guarantee that a read performed right before an acquire-fence will be fresh/immediate? Do you have any source? I can't seem to find that anywhere. – Limner 9/2, 2014 at 17:47

Regarding the Alpha architecture, I found this, which I think is what you meant: "WMB (Write Memory Barrier) causes writes that are contained in buffers to be completed without unnecessary delay.". Again, this means that the barrier would have to be placed after the write (as I suspected), not before. – Limner 9/2, 2014 at 17:59

I'm not saying that there exist any architecture which guarantees that a read which immediately precedes an acquire fence performs a full fetch from main memory. (Indeed, I doubt that any such exist because it would require the ability to communicate decode information backwards in time. Suppose the two instructions straddled a page boundary and the second page was not present.) What I'm saying that you need to read your processor manual to see what guarantees do exist. – Tab 10/2, 2014 at 15:21

You asked me in email to comment on this thread. My comment is: when users have questions like this I refer them to people like Raymond Chen or Joe Duffy. I personally don't need to know the answer because I've fortunately never needed to write code that depended on volatility for its correctness. I try hard to avoid multithreading in the first place. – Cyprinodont 10/2, 2014 at 16:7

Thanks for the fast response @EricLippert ;) – Limner 10/2, 2014 at 17:14

@RaymondChen I've checked a few ISAs, and none of their fences guarantee immediacy for reads performed before the fence, or for writes performed after the fence (as I suspected). After speaking with you and with BrianGideon here and here (see comments) and after checking the C# specification, I'm quite convinced that the MSDN docs are wrong, and that the current implementation does not make such guarantees. Would you agree? – Limner 10/2, 2014 at 17:17

I'm going to defer to Joe Duffy for a more informed opinion. My uninformed opinion is that the MSDN documentation is really trying to talk about acquire and release semantics, not immediate visibility. The bigger question is why you require immediate visibility. That is not achievable in practice due to this thing called "special relativity". – Tab 10/2, 2014 at 21:53

I am not trying to achieve it, at all. I'm only trying to understand exactly what happens behind the scenes. From what I've gathered, many people believe that volatility includes immediate visibility. I'm starting to think it doesn't. But you're right, I may be simply misinterpreting the MSDN docs. Thank you so much for your input @RaymondChen. – Limner 10/2, 2014 at 22:3

Sorry I don't know. Maybe there's a contact page on his blog. – Tab 10/2, 2014 at 22:6

@Limner Have you ever find satisfying answer to your question? I do have very same question - I also posted it here on SO. I believe that the answer is to put the fence (MemoryBarier) ALSO BEFORE the read (load operation). The sole ReadAcquireFence (fence after reading) gives you only guarantee of not reordering (so once that other memory will be at least as fresh as that variable - but it all might be just stale snapshot in time) – Deneb 10/7, 2014 at 17:25

How exactly does this prevent the read operation from using a previously cached value?

It does no such thing. A volatile read does not guarantee that the latest value will be returned. In plain English all it really means is that the next read will return a newer value and nothing more.

How does preventing the read from being moved forwards in time (or subsequent instructions from being moved backwards in time) guarantee a volatile (fresh) read? How does it help?

Be careful with the terminology here. Volatile is not synonymous with fresh. As I already mentioned above its real usefulness lies in how two or more volatile reads are chained together. The next read in a sequence of volatile reads will absolutely return a newer value than the previous read of the same address. Lock-free code should be written with this premise in mind. That is, the code should be structured to work on the principal of dealing with a newer value and not the latest value. This is why most lock-free code spins in a loop until it can verify that the operation completely successfully.

The ideas in this book (and my own personal beliefs) seem to contradict the ideas behind C#'s VolatileRead and VolatileWrite implementations.

Not really. Remember volatile != fresh. Yes, if you want a "fresh" read then you need to place an acquire-fence before the read. But, that is not the same as doing a volatile read. What I am saying is that if the implementation of VolatileRead had the call to Thread.MemoryBarrier before the read instruction then it would not actually produce a volatile read. If would produce fresh a read though.

Acton answered 4/8, 2014 at 1:24 Comment(4)

Good answer but not quite correct. Put simply, there is unfortunately NO way to guarantee a 'fresh' read on a machine running multiple cores. Memory barriers simply guarantee ordering. They cannot guarantee that a thread running on one core will read a value which has simultaneously been written on a different core. If this were feasible it would be possible to implement a synchronised boolean to control multithreaded entry to methods and completely do away with the need for locks. – Chronoscope 10/6, 2016 at 9:25

"Yes, if you want a "fresh" read then you need to place an acquire-fence before the read". Are you suggesting that if another thread wrote the variable and we read it by adding a fence before the read, we would get the latest value? I think that due to "special relativity (explained in the question's comments), it can't be guaranteed. – Breadbasket 22/6, 2016 at 14:21

@Petrakeas: Yes, but be careful how you define "latest". By the time you use or make a decision based on the value it might not be the "latest" anymore. Also read the comment above. 0b101010 makes a good point about the subtleties of multi core systems. In short "fresh" or "latest" have to be interpreted very loosely here. – Acton 26/6, 2016 at 2:28

The way I understand it is that "when" thread B sees the updated complete value it is guaranteed that it will see the updated answer value. However, it may not see it instantly. @Chronoscope can you please comment on the following example that exercises "freshness"? #38051181 – Breadbasket 27/6, 2016 at 9:48

The important thing to understand is that volatile does not only mean "cannot cache value", but also gives important visibility guarantees (to be exact, it's entirely possible to have a volatile write that only goes to cache; depends solely on the hardware and its used cache coherency protocols)

A volatile read gives acquire semantics, while a volatile write has release semantics. An acquire fence means that you cannot reorder reads or writes before the fence, while a release fence means you cannot move them after the fence. The linked answer in the comments explains that actually quite nicely.

Now the question is, if we don't have any memory barrier before the load how is it guaranteed that we'll see the newest value? The answer to that is: Because we also put memory barriers after each volatile write to guarantee that.

Doug Lea wrote a great summary on which barriers exist, what they do and where to put them for volatile reads/writes for the JMM as a help for compiler writers, but the text is also quite useful for other people. Volatile reads and writes give the same guarantees in both Java and the CLR so that's generally applicable.

Source - scroll down to the "Memory Barriers" section (I'd copy the interesting parts, but the formatting doesn't survive it..)

Meaganmeager answered 9/2, 2014 at 0:15 Comment(13)

"it's entirely possible to have a volatile write that only goes to cache" .But the VolatileRead method does guarantee that the latest value will be seen, and VolatileWrite guarantees that the value will be immediately seen by all processors. – Limner 9/2, 2014 at 8:11

And all they do is add a fence after the read, or a fence before the write. I guess what I'm missing is: fences give you acquire-release semantics - but how do these semantics translate to fresh reads/committed writes? – Limner 9/2, 2014 at 8:12

@Limner It guarantees that you see the newest value, it doesn't tell the implementation how to do that. Basically there are cache coherence protocols that don't invalidate cachelines but update them instead. Voila you now can see the most up-to-date value without having to write back to memory. It's a bit nitpicky, but it's a good idea to keep implementation details and what the spec says apart. – Meaganmeager 9/2, 2014 at 12:38

And basically the fence after the write guarantees you that the processor has to make sure to publish it for every read (if you don't need the visibility guarantees, a normal read is perfectly fine to see the newest values as long as you only do volatile writes). I guess your problem is: It may happen that we don't see the write if we read it before the barrier instruction - true (although many ISAs combine the two together), but not a problem as long as the ordering guarantees aren't violated. – Meaganmeager 9/2, 2014 at 12:40

This is becoming even more confusing... That's what I would have thought - a fence after a write would tell the processor to publish whatever is in the write-buffer to main memory, and make it visible to other processors. But the catch is: the C# implementation places the fence before the write!! – Limner 9/2, 2014 at 12:44

I can't comment on the trustworthiness of your source there or if there's more to it somewhere else, but yes that looks indeed incorrect. – Meaganmeager 9/2, 2014 at 12:48

That's the reference source of the .NET framework published by Microsoft. @BrianGideon seems to agree (here), although I didn't understand his explanation. I know it must be right, I just don't understand how. I would have done the opposite - put the fences before the reads (to prevent them from being pre-fetched), and after the writes (to prevent them from being delayed). – Limner 9/2, 2014 at 12:51

(Note: actually, that link is a copy of the reference source, distributed by Microsoft here). – Limner 9/2, 2014 at 13:4

@Limner Strange. The only way this works seems to me if you had a processor with very strong memory orderings like x86. If the code was only used there it would work just fine, but I don't see how this would be an universally correct solution. Heck putting full blown fences in there is inefficient to begin with. But yeah I guess since MS doesn't support that many different ISAs that may just be their x86 code? – Meaganmeager 9/2, 2014 at 13:4

I completely agree with you, it would take a strong memory model to get the expected behaviour out of this. Either way, I've updated the title and the content of the post, and added more examples and references. I hope I've made my dilemma clearer. – Limner 9/2, 2014 at 15:39

@Limner Correct me if I'm wrong but is it not physically impossible to guarantee a 'fresh' value regardless of how many barriers you have and where you put them. Imagine bar+read+bar on thread1 and bar+write+bar on thread2. Thread1 falls asleep after its first barrier, thread2 falls asleep after its write => bar+bar,write+read/read+write,bar+bar. What will you read? (Simplified because in reality the threads could execute simultaneously on different cores!) Volatile I believe guarantees that if you observe one of your threads, the values in the observed thread never arrive out of order. – Consociate 7/1, 2015 at 16:22

[part2...] Volatile then basically says that a value must be manipulated in order, as opposed to out of order where a newer value may be overwritten by an older one. The barrier before the write: "Hey guys, I'm about to write to this variable, so if you have any pending writes please finish them before I do mine". (The barrier after the read though I believe isn't so much after the read as it is in between reading a value into the local variable and using the local variable.) (This is all based on my very limited understand and may potentially be wrong in many ways.) – Consociate 7/1, 2015 at 16:44

@Consociate Absolutely correct. Much of the information in this answer is wrong. You cannot use memory barriers to guarantee freshness. They simply provide ordering guarantees. Put simply, there is NO way to guarantee you are seeing the latest value written when writes are potentially taking place on multiple cores with multiple caches. – Chronoscope 10/6, 2016 at 9:10

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags