Why is the standard C# event invocation pattern thread-safe without a memory barrier or cache invalidation? What about similar code?
Asked Answered
A

5

15

In C#, this is the standard code for invoking an event in a thread-safe way:

var handler = SomethingHappened;
if(handler != null)
    handler(this, e);

Where, potentially on another thread, the compiler-generated add method uses Delegate.Combine to create a new multicast delegate instance, which it then sets on the compiler-generated field (using interlocked compare-exchange).

(Note: for the purposes of this question, we don't care about code that runs in the event subscribers. Assume that it's thread-safe and robust in the face of removal.)


In my own code, I want to do something similar, along these lines:

var localFoo = this.memberFoo;
if(localFoo != null)
    localFoo.Bar(localFoo.baz);

Where this.memberFoo could be set by another thread. (It's just one thread, so I don't think it needs to be interlocked - but maybe there's a side-effect here?)

(And, obviously, assume that Foo is "immutable enough" that we're not actively modifying it while it is in use on this thread.)


Now I understand the obvious reason that this is thread-safe: reads from reference fields are atomic. Copying to a local ensures we don't get two different values. (Apparently only guaranteed from .NET 2.0, but I assume it's safe in any sane .NET implementation?)


But what I don't understand is: What about the memory occupied by the object instance that is being referenced? Particularly in regards to cache coherency? If a "writer" thread does this on one CPU:

thing.memberFoo = new Foo(1234);

What guarantees that the memory where the new Foo is allocated doesn't happen to be in the cache of the CPU the "reader" is running on, with uninitialized values? What ensures that localFoo.baz (above) doesn't read garbage? (And how well guaranteed is this across platforms? On Mono? On ARM?)

And what if the newly created foo happens to come from a pool?

thing.memberFoo = FooPool.Get().Reset(1234);

This seems no different, from a memory perspective, to a fresh allocation - but maybe the .NET allocator does some magic to make the first case work?


My thinking, in asking this, is that a memory barrier would be required to ensure - not so much that memory accesses cannot be moved around, given the read is dependent - but as a signal to the CPU to flush any cache invalidations.

My source for this is Wikipedia, so make of that what you will.

(I might speculate that maybe the interlocked-compare-exchange on the writer thread invalidates the cache on the reader? Or maybe all reads cause invalidation? Or pointer dereferences cause invalidation? I'm particularly concerned how platform-specific these things sound.)


Update: Just to make it more explicit that the question is about CPU cache invalidation and what guarantees .NET provides (and how those guarantees might depend on CPU architecture):

  • Say we have a reference stored in field Q (a memory location).
  • On CPU A (writer) we initialize an object at memory location R, and write a reference to R into Q
  • On CPU B (reader), we dereference field Q, and get back memory location R
  • Then, on CPU B, we read a value from R

Assume the GC does not run at any point. Nothing else interesting happens.

Question: What prevents R from being in B's cache, from before A has modified it during initialisation, such that when B reads from R it gets stale values, in spite of it getting a fresh version of Q to know where R is in the first place?

(Alternate wording: what makes the modification to R visible to CPU B at or before the point that the change to Q is visible to CPU B.)

(And does this only apply to memory allocated with new, or to any memory?)+


Note: I've posted a self-answer here.

Ahoufe answered 10/6, 2015 at 14:48 Comment(4)
I don't understand what you expect about atomic being thread-safe. My understanding is that atomic only means that torn reads/writes won't occur, which has nothing to do with cache-coherence.Polestar
@ChrisO Exactly. The read of memberFoo (or Q) is atomic. But the read of localFoo.baz (or R) is a separate step (as is the write, in the initialization). So what guarantees the ordering of those two things as visible between different CPUs?Ahoufe
Doesn't the x86 / x86-64 memory model prohibit seeing writes out of order?Aquiline
Related: The C# Memory Model in Theory and Practice. Short answer is x86 and x86-64 are stronger than the CLR memory model, ARM and Itanium are not, but the JIT does what it can to help.Aquiline
A
2

I think I have figured out what the answer is. But I'm not a hardware guy, so I'm open to being corrected by someone more familiar with how CPUs work.


The .NET 2.0 memory model guarantees:

Writes cannot move past other writes from the same thread.

This means that the writing CPU (A in the example), will never write a reference to an object into memory (to Q), until after it has written out contents of that object being constructed (to R). So far, so good. This cannot be re-ordered:

R = <data>
Q = &R

Let's consider the reading CPU (B). What is to stop it reading from R before it reads from Q?

On a sufficiently naïve CPU, one would expect it to be impossible to read from R without first reading from Q. We must first read Q to get the address of R. (Note: it is safe to assume that the C# compiler and JIT behave this way.)

But, if the reading CPU has a cache, couldn't it have stale memory for R in its cache, but receive the updated Q?

The answer seems to be no. For sane cache coherency protocols, invalidation is implemented as a queue (hence "invalidation queue"). So R will always be invalidated before Q is invalidated.

Apparently the only hardware where this is not the case is the DEC Alpha (according to Table 1, here). It is the only listed architecture where dependent reads can be re-ordered. (Further reading.)

Ahoufe answered 21/8, 2015 at 3:26 Comment(1)
The ".NET memory model" is Microsofts model. ECMA does not make this safe. But then, the only thing that matters is Microsofts memory model. Other implementations (Mono) must follow. Microsoft themselves cannot break this for compatibility reasons.Longways
O
2

This is a really good question. Let us consider your first example.

var handler = SomethingHappened;
if(handler != null)
    handler(this, e);

Why is this safe? To answer that question you first have to define what you mean by "safe". Is it safe from a NullReferenceException? Yes, it is pretty trivial to see that caching the delegate reference locally eliminates that pesky race between the null check and the invocation. Is it safe to have more than one thread touching the delegate? Yes, delegates are immutable so there is no way that one thread can cause the delegate to get into a half-baked state. The first two are obvious. But, what about a scenario where thread A is doing this invocation in a loop and thread B at some later point in time assigns the first event handler? Is that safe in the sense that thread A will eventually see a non-null value for the delegate? The somewhat surprising answer to this is probably. The reason is that the default implementations of the add and remove accessors for the event create memory barriers. I believe the early version of the CLR took an explicit lock and later versions used Interlocked.CompareExchange. If you implemented your own accessors and omitted a memory barrier then the answer could be no. I think in reality it highly depends on whether Microsoft added memory barriers to the construction of the multicast delegate itself.

On to the second and more interesting example.

var localFoo = this.memberFoo;
if(localFoo != null)
    localFoo.Bar(localFoo.baz);

Nope. Sorry, this actually is not safe. Let us assume memberFoo is of type Foo which is defined like the following.

public class Foo
{
  public int baz = 0;
  public int daz = 0;

  public Foo()
  {
    baz = 5;
    daz = 10;
  }

  public void Bar(int x)
  {
    x / daz;
  }
}

And then let us assume another thread does the following.

this.memberFoo = new Foo();

Despite what some may think there is nothing that mandates that instructions have to be executed in the order that they were defined in the code as long as the intent of the programmer is logically preserved. The C# or JIT compilers could actually formulate the following sequence of instructions.

/* 1 */ set register = alloc-memory-and-return-reference(typeof(Foo));
/* 2 */ set register.baz = 0;
/* 3 */ set register.daz = 0;
/* 4 */ set this.memberFoo = register;
/* 5 */ set register.baz = 5;  // Foo.ctor
/* 6 */ set register.daz = 10; // Foo.ctor

Notice how the assignment to memberFoo occurs before the constructor is run. That is valid because it does not have any unintended side-effects from the perspective of the thread executing it. It could, however, have a major impact on other threads. What happens if your null check of memberFoo on the reading thread occurred when the writing thread just fininished instruction #4? The reader will see a non-null value and then attempt to invoke Bar before the daz variable got set to 10. daz will still hold its default value of 0 thus leading to a divide by zero error. Of course, this is mostly theoretical because Microsoft's implementation of the CLR creates a release-fence on writes that would prevent this. But, the specification would technically allow for it. See this question for related content.

Orthopter answered 16/8, 2015 at 3:17 Comment(3)
I am not sure your re-ordering example is correct. According to this, the .NET 2.0 memory model guarantees that "Writes cannot move past other writes from the same thread."Ahoufe
@AndrewRussell: That is correct. Microsoft's implementation of the CLI has release-fence semantics on writes. However, the ECMA specification does not mandate that (the linked article implies this). It is possible, though highly unlikely, that another implementation (like Mono) could allow it.Orthopter
This is the important point: ECMA does not mandate that this is safe. Actual implementations make this safe since a lot of code would be broken otherwise.Longways
A
2

I think I have figured out what the answer is. But I'm not a hardware guy, so I'm open to being corrected by someone more familiar with how CPUs work.


The .NET 2.0 memory model guarantees:

Writes cannot move past other writes from the same thread.

This means that the writing CPU (A in the example), will never write a reference to an object into memory (to Q), until after it has written out contents of that object being constructed (to R). So far, so good. This cannot be re-ordered:

R = <data>
Q = &R

Let's consider the reading CPU (B). What is to stop it reading from R before it reads from Q?

On a sufficiently naïve CPU, one would expect it to be impossible to read from R without first reading from Q. We must first read Q to get the address of R. (Note: it is safe to assume that the C# compiler and JIT behave this way.)

But, if the reading CPU has a cache, couldn't it have stale memory for R in its cache, but receive the updated Q?

The answer seems to be no. For sane cache coherency protocols, invalidation is implemented as a queue (hence "invalidation queue"). So R will always be invalidated before Q is invalidated.

Apparently the only hardware where this is not the case is the DEC Alpha (according to Table 1, here). It is the only listed architecture where dependent reads can be re-ordered. (Further reading.)

Ahoufe answered 21/8, 2015 at 3:26 Comment(1)
The ".NET memory model" is Microsofts model. ECMA does not make this safe. But then, the only thing that matters is Microsofts memory model. Other implementations (Mono) must follow. Microsoft themselves cannot break this for compatibility reasons.Longways
Q
0

Capturing reference to immutable object guarantees thread safety (in sense of consistency, it does not guarantee that you get the latest value).

List of events handlers are immutable and thus it is enough for thread safety to capture reference to current value. The whole object would be consistent as it never change after initial creation.

Your sample code does not explicitly state if Foo is immutable, so you get all sorts of problems with figuring out whether the object can change or not i.e. directly by setting properties. Note that code would be "unsafe" even in single-threaded case as you can't guarantee that particular instance of Foo does not change.

On CPU caches and like: The only change that can invalidate data at actual location in memory for true immutable object is GC's compaction. That code ensures all necessary locks/cache consistency - so managed code would never observe change in bytes referenced by your cached pointer to immutable object.

Quickman answered 10/6, 2015 at 15:29 Comment(4)
The immutability thing isn't really relevant (I've amended the question to clarify that). I'm not sure the thing about CPU cache answers the question. I understand how GC compaction works. I'll make another edit that might hopefully make things more clear.Ahoufe
@AndrewRussell I'm not really follow your edit - basically it sounds like "in case of events memory occupied by list of events will never be changed in a way visible to managed code, but if what if we violate that and randomly change memory bad things could happen".Quickman
@AndrewRussell also note that Hans Passant answer covers "consistent vs. latest" part of the issue significantly better than my single sentence.Quickman
So, start with P and Q being filled with garbage. CPU B sees this: "R is initialized, Q is initialized". We hope that CPU A sees "R is initialized, Q is initialized". But maybe it could see "Q is initialized... R is initialized". We don't want the latter ordering to happen -- but what prevents it from happening? It's clearly not magic - the compiler, JIT, and especially the CPU have to do something to enforce that, right?Ahoufe
T
0

When this is evaluated:

thing.memberFoo = new Foo(1234);

First new Foo(1234) is evaluated, which means that the Foo constructor executes to completion. Then thing.memberFoo is assigned the value. This means that any other thread reading from thing.memberFoo is not going to read an incomplete object. It's either going to read the old value, or it's going to read the reference to the new Foo object after its constructor has completed. Whether this new object is in the cache or not is irrelevant; the reference being read won't point to the new object until after the constructor has completed.

The same thing happens with the object pool. Everything on the right evaluates completely before the assignment happens.

In your example, B will never get the reference to R before R's constructor has run, because A does not write R to Q until A has completed constructing R. If B reads Q before that, it will get whatever value was already in Q. If R's constructor throws an exception, then Q will never be written to.

C# order of operations guarantees this will happen this way. Assignment operators have the lowest precedence, and new and function call operators have the highest precedence. This guarantees that the new will evaluate before the assignment is evaluated. This is required for things like exceptions -- if an exception is thrown by the constructor then the object being allocated will be in an invalid state and you don't want that assignment to occur regardless of whether you're multithreaded or not.

Traveled answered 10/6, 2015 at 16:26 Comment(2)
Ok, so "B will never get the reference to R before R's constructor has run" --- but what exactly is it that ensures that the result of "R's constructor has run" is visible to CPU B, before or at the point where the modification of Q is visible to CPU B? There's no explicit memory barrier to ensure that ordering. So it must be something else.Ahoufe
Regarding your edit that links to the operator precedence table -- that is only relevant to code within a single thread of execution. Another thread observing that code executing can see things happening in a significantly different order.Ahoufe
H
0

It seems to me you should be using see this article in this case. This ensures the compiler doesn't perform optimisations that assume access by a single thread.

Events used to use locks, but as of C# 4 use lock-free synchronisation - I'm not sure exactly what (see this article).

EDIT: The Interlocked methods use memory barriers which will ensure all threads read the updated value (on any sane system). So long as you perform all updates with Interlocked you can safely read the value from any thread without a memory barrier. This is the pattern used in the System.Collections.Concurrent classes.

Horme answered 20/7, 2015 at 23:13 Comment(1)
volatile also inserts memory barriers (to ensure changes are visible between threads). But my question is why the given code is safe without a memory barrier. // The lock-free synchronization used by events that you refer to is only on the += and -= operators (not on access, which is what my question is about). It uses Interlocked.CompareExchange.Ahoufe

© 2022 - 2024 — McMap. All rights reserved.