Events and multithreading once again

Asked 11/5, 2016 at 20:50 Answered 12/5, 2016 at 17:36

Solved c#.net multithreading language-lawyer memory-model

I'm worried about the correctness of the seemingly-standard pre-C#6 pattern for firing an event:

EventHandler localCopy = SomeEvent;
if (localCopy != null)
    localCopy(this, args);

I've read Eric Lippert's Events and races and know that there is a remaining issue of calling a stale event handler, but my worry is whether the compiler/JITter is allowed to optimize away the local copy, effectively rewriting the code as

if (SomeEvent != null)
    SomeEvent(this, args);

with possible NullReferenceException.

According to the C# Language Specification, §3.10,

The critical execution points at which the order of these side effects must be preserved are references to volatile fields (§10.5.3), lock statements (§8.12), and thread creation and termination.

— so there are no critical execution points are in the mentioned pattern, and the optimizer is not constrained by that.

The related answer by Jon Skeet (year 2009) states

The JIT isn't allowed to perform the optimization you're talking about in the first part, because of the condition. I know this was raised as a spectre a while ago, but it's not valid. (I checked it with either Joe Duffy or Vance Morrison a while ago; I can't remember which.)

— but comments refer to this blog post (year 2008): Events and Threads (Part 4), which basically says that CLR 2.0's JITter (and probably subsequent versions?) must not introduce reads or writes, so there must be no problem under Microsoft .NET. But this seems to say nothing about other .NET implementations.

[Side note: I don't see how non-introducing of reads proves the correctness of the said pattern. Couldn't JITter just see some stale value of SomeEvent in some other local variable and optimize out one of the reads, but not the other? Perfectly legitimate, right?]

Moreover, this MSDN article (year 2012): The C# Memory Model in Theory and Practice by Igor Ostrovsky states the following:

Non-Reordering Optimizations Some compiler optimizations may introduce or eliminate certain memory operations. For example, the compiler might replace repeated reads of a field with a single read. Similarly, if code reads a field and stores the value in a local variable and then repeatedly reads the variable, the compiler could choose to repeatedly read the field instead.

Because the ECMA C# spec doesn’t rule out the non-reordering optimizations, they’re presumably allowed. In fact, as I’ll discuss in Part 2, the JIT compiler does perform these types of optimizations.

This seems to contradict the Jon Skeet's answer.

As now C# is not a Windows-only language any more, the question arises whether the validity of the pattern is a consequence of limited JITter optimizations in the current CLR implementation, or it is expected property of the language.

So, the question is following: is the pattern being discussed valid from the point of view of C#-the-language? (That implies whether a language compiler/runtime is required to prohibit certain kind of optimizations.)

Of course, normative references on the topic are welcome.

Caitiff answered 11/5, 2016 at 20:50 Comment(7)

See codeblog.jonskeet.uk/2015/01/30/… also – Faraway 11/5, 2016 at 21:59

@IanMercer: Yes, I know that the C#6 pattern is valid (that's why I wrote "pre-C#6 pattern" in the question). Nevertheless the question about the validity of the old pattern remains. For me, it's more of practical interest: with the advance of cross-platform development, should I rush to replace the old pattern in all of the legacy code with the new one, or the old one is still okay even on Xamarin? But thanks for your link anyway. – Caitiff 11/5, 2016 at 22:2

What about lock statements? Could they not be used in pre-6.0 code? – Commercialize 11/5, 2016 at 23:57

@MagikM18: Yes, they could be used. The point is however that the pattern doesn't use locks. The question is not "how to do it right?", but rather "is the pattern (without locks) correct?". – Caitiff 12/5, 2016 at 0:0

True, and I agree it does look very ambiguous – Commercialize 12/5, 2016 at 0:2

Or you can side step the whole problem completely and use IObservables. – Slipon 12/5, 2016 at 8:8

@Aron: Thank you for the suggestion, but the question is not how to implement events (or event-like program logic) in the right way (if in doubt, one can always resort to explicit locks). The question is whether the popular pattern is correct. – Caitiff 12/5, 2016 at 8:10

According to the sources you provided and a few others in the past, it breaks down to this:

With the Microsoft implementation, you can rely on not having read introduction ^[1] ^[2] ^[3]
For any other implementation, it may have read introduction unless it states otherwise

EDIT: Having re-read the ECMA CLI specification carefully, read introductions are possible, but constrained. From Partition I, 12.6.4 Optimization:

Conforming implementations of the CLI are free to execute programs using any technology that guarantees, within a single thread of execution, that side-effects and exceptions generated by a thread are visible in the order specified by the CIL. For this purpose only volatile operations (including volatile reads) constitute visible side-effects. (Note that while only volatile operations constitute visible side-effects, volatile operations also affect the visibility of non-volatile references.)

A very important part of this paragraph is in parentheses:

Note that while only volatile operations constitute visible side-effects, volatile operations also affect the visibility of non-volatile references.

So, if the generated CIL reads a field only once, the implementation must behave the same. If it introduces reads, it's because it can prove that the subsequent reads will yield the same result, even facing side effects from other threads. If it cannot prove that and it still introduces reads, it's a bug.

In the same manner, C# the language also constrains read introduction at the C#-to-CIL level. From the C# Language Specification Version 5.0, 3.10 Execution Order:

Execution of a C# program proceeds such that the side effects of each executing thread are preserved at critical execution points. A side effect is defined as a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception. The critical execution points at which the order of these side effects must be preserved are references to volatile fields (§10.5.3), lock statements (§8.12), and thread creation and termination. The execution environment is free to change the order of execution of a C# program, subject to the following constraints:

Data dependence is preserved within a thread of execution. That is, the value of each variable is computed as if all statements in the thread were executed in original program order.

Initialization ordering rules are preserved (§10.5.4 and §10.5.5).

The ordering of side effects is preserved with respect to volatile reads and writes (§10.5.3). Additionally, the execution environment need not evaluate part of an expression if it can deduce that that expression’s value is not used and that no needed side effects are produced (including any caused by calling a method or accessing a volatile field). When program execution is interrupted by an asynchronous event (such as an exception thrown by another thread), it is not guaranteed that the observable side effects are visible in the original program order.

The point about data dependence is the one I want to emphasize:

Data dependence is preserved within a thread of execution. That is, the value of each variable is computed as if all statements in the thread were executed in original program order.

As such, looking at your example (similar to the one given by Igor Ostrovsky ^[4]):

EventHandler localCopy = SomeEvent;
if (localCopy != null)
    localCopy(this, args);

The C# compiler should not perform read introduction, ever. Even if it can prove that there are no interfering accesses, there's no guarantee from the underlying CLI that two sequential non-volatile reads on SomeEvent will have the same result.

Or, using the equivalent null conditional operator since C# 6.0:

SomeEvent?.Invoke(this, args);

The C# compiler should always expand to the previous code (guaranteeing a unique non-conflicting variable name) without performing read introduction, as that would leave the race condition.

The JIT compiler should only perform the read introduction if it can prove that there are no interfering accesses, depending on the underlying hardware platform, such that the two sequential non-volatile reads on SomeEvent will in fact have the same result. This may not be the case if, for instance, the value is not kept in a register and if the cache may be flushed between reads.

Such optimization, if local, can only be performed on plain (non-ref and non-out) parameters and non-captured local variables. With inter-method or whole program optimizations, it can be performed on shared fields, ref or out parameters and captured local variables that can be proven they are never visibly affected by other threads.

So, there's a big difference whether it's you writing the following code or the C# compiler generating the following code, versus the JIT compiler generating machine code equivalent to the following code, as the JIT compiler is the only one capable of proving if the introduced read is consistent with the single thread execution, even facing potential side-effects caused by other threads:

if (SomeEvent != null)
    SomeEvent(this, args);

An introduced read that may yield a different result is a bug, even according to the standard, as there's an observable difference were the code executed in program order without the introduced read.

As such, if the comment in Igor Ostrovsky's example ^[4] is true, I say it's a bug.

[1]: A comment by Eric Lippert; quoting:

To address your point about the ECMA CLI spec and the C# spec: the stronger memory model promises made by CLR 2.0 are promises made by Microsoft. A third party that decided to make their own implementation of C# that generates code that runs on their own implementation of CLI could choose a weaker memory model and still be compliant with the specifications. Whether the Mono team has done so, I do not know; you'll have to ask them.

[2]: CLR 2.0 memory model by Joe Duffy, reiterating the next link; quoting the relevant part:

Rule 1: Data dependence among loads and stores is never violated.

Rule 2: All stores have release semantics, i.e. no load or store may move after one.

Rule 3: All volatile loads are acquire, i.e. no load or store may move before one.

Rule 4: No loads and stores may ever cross a full-barrier (e.g. Thread.MemoryBarrier, lock acquire, Interlocked.Exchange, Interlocked.CompareExchange, etc.).

Rule 5: Loads and stores to the heap may never be introduced.

Rule 6: Loads and stores may only be deleted when coalescing adjacent loads and stores from/to the same location.

[3]: Understand the Impact of Low-Lock Techniques in Multithreaded Apps by Vance Morrison, the latest snapshot I could get on the Internet Archive; quoting the relevant portion:

Strong Model 2: .NET Framework 2.0

(...)

All the rules that are contained in the ECMA model, in particular the three fundamental memory model rules as well as the ECMA rules for volatile.

Reads and writes cannot be introduced.

A read can only be removed if it is adjacent to another read to the same location from the same thread. A write can only be removed if it is adjacent to another write to the same location from the same thread. Rule 5 can be used to make reads or writes adjacent before applying this rule.

Writes cannot move past other writes from the same thread.

Reads can only move earlier in time, but never past a write to the same memory location from the same thread.

[4]: C# - The C# Memory Model in Theory and Practice, Part 2 by Igor Ostrovsky, where he shows a read introduction example that, according to him, the JIT may perform such that two consequent reads may have different results; quoting the relevant part:

Read Introduction As I just explained, the compiler sometimes fuses multiple reads into one. The compiler can also split a single read into multiple reads. In the .NET Framework 4.5, read introduction is much less common than read elimination and occurs only in very rare, specific circumstances. However, it does sometimes happen.

To understand read introduction, consider the following example:

public class ReadIntro {
  private Object _obj = new Object();
  void PrintObj() {
    Object obj = _obj;
    if (obj != null) {
      Console.WriteLine(obj.ToString());
    // May throw a NullReferenceException
    }
  }
  void Uninitialize() {
    _obj = null;
  }
}

If you examine the PrintObj method, it looks like the obj value will never be null in the obj.ToString expression. However, that line of code could in fact throw a NullReferenceException. The CLR JIT might compile the PrintObj method as if it were written like this:

void PrintObj() {
  if (_obj != null) {
    Console.WriteLine(_obj.ToString());
  }
}

Because the read of the _obj field has been split into two reads of the field, the ToString method may now be called on a null target.

Note that you won’t be able to reproduce the NullReferenceException using this code sample in the .NET Framework 4.5 on x86-x64. Read introduction is very difficult to reproduce in the .NET Framework 4.5, but it does nevertheless occur in certain special circumstances.

Toluol answered 12/5, 2016 at 17:36 Comment(25)

Interesting. For me, having strong experience in C++ multithreading, all these examples look like UB: unsynchronized access to data (especially in presence of writes) is prohibited. In your examples, the task thread is performing the synchronization, but the main thread isn't => UB. I am trying to understand the right way of thinking in C#, that's why the question is asked. – Caitiff 12/5, 2016 at 18:12

So, your point basically is that the pattern is not valid from the point of view of C#-the-language, and is valid only if the runtime/JITter actively helps? – Caitiff 12/5, 2016 at 18:23

The big difference here is that C and C++ are riddled with undefined behavior (UB), whereas languages that target managed environments (and the environments themselves) should at most not guarantee certain things, such as the order of independent non-volatile reads and writes of variables in a single thread. However, these things are not undefined behavior, as in anything can happen, no; at most, any acceptable single-thread order of reads and writes might occur, and they should occur as if only once. – Toluol 14/5, 2016 at 10:59

Let me think a little bit about your answer. – Caitiff 15/5, 2016 at 11:40

Having read once more the Vance Morrison's article (here), I've stumbled upon the statement which basically says that while JIT must not introduce reads, the C#-to-IL compiler may do this: (citation follows...) – Caitiff 23/12, 2021 at 19:11

"In systems using the ECMA model, there is an additional subtlety. Even if only one memory location is fetched into a local variable and that local is used multiple times, each use might have a different value! This is because the ECMA model lets the compiler eliminate the local variable and re-fetch the location on each use. If updates are happening concurrently, each fetch could have a different value." – Caitiff 23/12, 2021 at 19:11

Introducing a read, just by itself, is not a sane optimization in a multithreaded scenario. Irrelevant read introduction may be sane, depending on the hardware, so the JIT might do it. However, a C#-to-bytecode compiler should not be introducing reads anywhere, since the actual hardware is not yet specified. It's my firm belief that relevant read introduction is not purposefully allowed in the .NET memory model, otherwise only locks provide sane synchronization. Maybe by mistake, or lack of foresight. – Toluol 18/4, 2022 at 22:8

As far as I understand, optimizations are always done without any regard to multithreaded lockless scenarios. It's other way round: multithreaded scenarios must account for possible optimization and "as-if rule". So no matter if read is sane or not to introduce from our point of view, it's a valid and lawful transformation (which can e. g. enable further optimizations). – Caitiff 9/5, 2022 at 15:56

If I'm not mistaken, loop cloning (mentioned here) is kind of introducing extra reads. – Caitiff 9/5, 2022 at 15:57

It's possible that there is an "inofficial rule" which prohibits read introduction, but I'd like having it clearly stated in official documentation. This optimization restriction cannot be considered a mere implementation detail because our code can be correct or wrong depending on whether this rule exists or not. – Caitiff 9/5, 2022 at 16:46

As far as I understand (…) — I disagree. The multithreaded factor, if present, must always be accounted for. Otherwise, it becomes both impossible and irrelevant to specify any kind of thread-safety, given the nature of some single-threaded optimizations. For instance, you can't safely eliminate reads around a method call, unless you're performing a whole-program optimization which knows said methods (base and overridden), don't perform volatile accesses, barriers, etc., which would make the elided read possibly obtain a different value from the single read that's left. – Toluol 31/5, 2022 at 20:44

If I'm not mistaken, loop cloning (mentioned here) is kind of introducing extra reads. — That's branching/conditionally jumping from a loop which checks bounds to a loop that doesn't; they're not exact clones. On each branch, there should be the same reads except for bounds checking. – Toluol 31/5, 2022 at 20:44

(…) I'd like having it clearly stated in official documentation. — I strongly agree. – Toluol 31/5, 2022 at 20:44

The multithreaded factor, if present, must always be accounted for. — Is there any documentation stating that optimization needs to take care about multithreading issues even if there is no synchronization used (volatile, mutexes etc.)? The common knowledge seems to be that C++ doesn't do this (according to the as-if rule). You can see it here: godbolt.org/z/x1YdeT4YK: the function work is compiled ignoring the fact that the variable b could be changed by the other threads. Is it not the same in C#? Hopefully someone (@Eric?) could shed some light on it. – Caitiff 1/6, 2022 at 20:16

The most similar platform, Java, has a detailed memory model and requirements for volatile declarations, synchronized statements, atomics, etc., so any optimization that doesn't violate them is valid. .NET and C# have a rough memory model which may be open to interpretation, but with similar requirements for volatile accesses, lock statements, atomics, etc., valid optimizations cannot violate them. From my point of view, for sanity in a multithreaded scenario, reads cannot be introduced. I've covered this with some detail here. – Toluol 2/6, 2022 at 10:45

Regarding C++, note how the reads to b are not elided in the following example: godbolt.org/z/o9dnzP7c6 The compiler cannot assume what work() does. It might synchronize, such that the second read to b is different. I declared work() as extern to avoid local optimizations on purpose, obtaining the same effect of linking to e.g. a runtime loaded library. – Toluol 2/6, 2022 at 10:46

About Java, you named explicitly synchronizing things like volatile and atomics. In the topic we discuss there are no synchronizing primitives involved, so does this mean that in Java any optimization is valid? – Caitiff 2/6, 2022 at 11:28

About your C++ example, I wouldn't say it contradicts my assumptions. As the function work is defined extern, the compiler cannot assume that work doesn't change the globally accessible b and b2 even in the current thread, so it cannot optimize the accesses out. My point is however that if a C++ compiler can see all the value changes in the current thread, it's free to completely ignore other threads. This means in particular introducing reads if the compiler finds it profitable. – Caitiff 2/6, 2022 at 11:52

(…) does this mean that in Java any optimization is valid? — No, it means the exact opposite, less optimizations are valid. Basically, var x = field; if (x != null) { /* accesses to x cannot throw NullPointerException */ }. If a read was introduced such that a NullPointerException could be possible, e.g. if (field != null) { /* replace x with field, possible NullPointerExceptions */ }, it's invalid. I've covered this same point in .NET/C# in my other answer. For more on this, please search for "On Validity of Program Transformations in the Java Memory Model". – Toluol 2/6, 2022 at 12:50

My point is however that if a C++ compiler can see all the value changes in the current thread, it's free to completely ignore other threads. — If the accesses are local. If you declare the variables as static in my example, clang treats b as local. But define void foo() { b = false; } and again it cannot consider b as local. I've covered the distinction between locals and everything else in my other answer. For more on this, please see Hans-J. Boehm's papers "Threads Cannot be Implemented as a Library" and "Outlawing Ghosts: Avoiding Out-of-Thin-Air Results". – Toluol 2/6, 2022 at 12:50

The site is telling me to "Please avoid extended discussions in comments." In general, I agree with you, we would need detailed documentation to know exactly what's going on. I gave you some pointers, and my 0.02¤, but I really cannot argue much more than what other people, with years of experience, have done with research and published papers dedicated to the subject. – Toluol 2/6, 2022 at 13:1

Asked a spin-off question about allowed optimizations in C++: https://mcmap.net/q/901636/-does-compiler-need-to-care-about-other-threads-during-optimizations/276994 – Caitiff 2/6, 2022 at 17:53

As you see from the answer and comments, C++ is strict on declaring unprotected access to the shared data as UB. This means that the pattern under discussion is UB and should never be used (in C++). And this hints to the conclusion that the C++ compiler is actually free to ignore multithreading issues if there is no synchronization because unprotected multithreaded access would be an UB (which is a grave developer's fault and allows the compiler to do literally anything). – Caitiff 7/6, 2022 at 7:52

C++ is strict on declaring unprotected access to the shared data as UB. — It still doesn't allow a compiler to optimize all reads and writes considering only a single-threaded point-of-view, due to potential acquire and release operations, such as using a mutex. Here's a note from the specification: Transformations that introduce a speculative read of a potentially shared memory location may not preserve the semantics of the C++ program as defined in this document, since they potentially introduce a data race. – Toluol 9/6, 2022 at 10:37

The latest draft has changed may to might, which seems better. Anyway, I'm tired of this discussion here. We shouldn't be arguing about C++ in this question or answer. – Toluol 9/6, 2022 at 10:39

The optimizer is not allowed to transform the pattern of code stored in a local variable that is later used into having all uses of that variable just be the original expression used to initialize it. That's not a valid transformation to make, so it's not an "optimization". The expression can cause, or be dependent on, side effects, so that expression needs to be run, stored somewhere, and then used when specified. It would be an invalid transformation of the runtime to resolve the event to a delegate twice, when your code only has it done once.

As far as re-ordering is concerned; the re-ordering of operation is quite complicated with respect to multiple threads, but the whole point of this pattern is that you're now doing the relevant logic in a single threaded context. The value of the event is stored into a local, and that read can be ordered more or less arbitrarily with respect to any code running in other threads, but the read of that value into the local variable cannot be re-reordered with respect to subsequent operations of that same thread, namely the if check or the invocation of that delegate.

Given that, the pattern does indeed do what it intends to do, which is to take a snapshot of the event and invokes all handlers, if there are any, without throwing a NRE due to there not being any handlers.

Yemane answered 12/5, 2016 at 16:17 Comment(10)

(Re: First paragraph): But the Igor Ostrovsky's quoted article says: "if code reads a field and stores the value in a local variable and then repeatedly reads the variable, the compiler could choose to repeatedly read the field instead". Doesn't this apply to the considered case? Or you disagree with the article's statement? – Caitiff 12/5, 2016 at 16:20

@Caitiff It's not a valid transformation, as stated there, otherwise even code as simple as this wouldn't work properly, var local = field; field = newValue; SomeMethod(local); It's just not a valid transformation to change all reads of a local initialized to a field to reads of the field (even if the local hasn't been set). It's also worth noting that we don't have a field here, we have an event, which is different. – Yemane 12/5, 2016 at 16:23

I'm not sure whether this is not a counter-example (based on this Marc Gravell's answer). For me, compiled in Release mode on VS 2015 and started outside the debugger, it outputs just Main exit, but the secondary thread stays looping forever. It seems that the code omits reloading the value from the field and reads a stale value. – Caitiff 12/5, 2016 at 16:41

About var local = field; field = newValue; SomeMethod(local);, this is a different story, as the "optimization" would break the data dependency within the thread. This is prohibited by C# Specs, §3.10: "• Data dependence is preserved within a thread of execution. That is, the value of each variable is computed as if all statements in the thread were executed in original program order." In my case this rule doesn't apply. – Caitiff 12/5, 2016 at 17:8

@Caitiff That's a multi-threaded context. That's entirely different. In that situation you're reading from a field, and a thread is using a cached version of that field without updating it to reflect changes made in another thread. The C# memory model doesn't guarantee that a variable written to by one thread will be observed by another. The code in question here is an entirely single threaded case, so that doesn't even begin to apply. – Yemane 12/5, 2016 at 17:8

@Caitiff It is an invalid transformation to turn repeated reads of a local into the expression setting that local because it's not a valid transformation even in a single threaded context. – Yemane 12/5, 2016 at 17:10

Plainly the transformation from local read to second field read is illegal if as you say we have a write to the field on the same thread before the read of the local. But that's not the question at hand! The question is whether the transformation is legal if the jitter knows that there is no intervening write to the field on the current thread. And the jitter can know that. – Osteopath 12/5, 2016 at 17:16

@EricLippert But can it know that there won't be any writes at the time the local is supposed to be initialized. It would need to know that the field won't ever be written to at any point between when it's supposed to initialize the local until the last time the local is read, if it wants to optimize out the existence of the local. I doubt it could know that. Obviously in my example it's clear to see, but if it's a call to a 3rd party library that fires some other event that has a handler that modifies the field, etc. it could be very hard to trace down, while still being single threaded. – Yemane 12/5, 2016 at 17:28

That's correct; there are some situations in which it is very hard to know that the local can be elided safely. But the question is: is there any circumstance in which the local can be elided safely given only knowledge of what is happening on the current thread? If there is then the local can be elided, which means that it can be unsafe with respect to multithreading. I honestly do not know if the memory model permits this or not; I am every bit as confused by all these conflicting reports as the original poster is. – Osteopath 12/5, 2016 at 17:57

@EricLippert: I think the problem is deeper than just with elision of locals. Consider the code: var local1 = SomeEvent; /*...*/ var local2 = SomeEvent; if (local2 != null) local2(...);. It seems to be perfectly fine to replace one of the usages of local2 with local1, if SomeEvent is not changed from the current thread's POV. I suspect that no reads/writes are introduced if local1 is cached in a register (but I'm not an expert anyway). – Caitiff 12/5, 2016 at 19:45

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags