Why is (or isn't) setting fields in a constructor thread-safe?
Asked Answered
C

6

15

Let's say you have a simple class like this:

class MyClass
{
    private readonly int a;
    private int b;

    public MyClass(int a, int b) { this.a = a; this.b = b; }

    public int A { get { return a; } }
    public int B { get { return b; } }
}

I could use this class in a multi-threaded manner:

MyClass value = null;
Task.Run(() => {
    while (true) { value = new MyClass(1, 1); Thread.Sleep(10); }
});
while (true)
{
    MyClass result = value;
    if (result != null && (result.A != 1 || result.B != 1)) { 
        throw new Exception(); 
    }
    Thread.Sleep(10);
}

My question is: will I ever see this (or other similar multi-threaded code) throw an exception? I often see reference to the fact that non-volatile writes might not immediately be seen by other threads. Thus, it seems like this could fail because the write to the value field might happen before the writes to a and b. Is this possible, or is there something in the memory model that makes this (quite common) pattern safe? If so, what is it? Does readonly matter for this purpose? Would it matter if a and b were a type that can't be atomically written (e. g. a custom struct)?

Coltish answered 19/3, 2015 at 13:42 Comment(7)
As I understand the .NET memory (and my understanding is severely limited on this level) is that yes, it sounds like you might have a problem here. Basically the constructor call with subsequent storage of the new reference into the variable can be seen as a series of writes, and they can be reordered. As I understand it, the write to store the reference to the variable can be reordered to happen before the writes that sets the field of the new instance. As such, to me, it sounds like your second loop might have a chance of observing a half-constructed object.Gerrigerrie
I found this article on the "C#" memory model: msdn.microsoft.com/en-us/magazine/jj863136.aspxGerrigerrie
However, I also remember reading about how the new initializer syntax worked and it generated code that specifically stored the reference to the newly constructed object into its intended location only after it had populated all the properties with their values, I will see if I can find some reference material, as I believe this was due to multithreaded code. I could be wrong however as you could easily write single-threaded code that "escapes" and leads to reading the variable midway the construction/initialization of the new object.Gerrigerrie
There is a high risk of this code throwing an exception, Threads are there to isolate resources and what you are doing is totally opposite. why you need to use a single class across two threads ?Syllabary
@LasseV.Karlsen That's not there to handle multithreading situations. It's there for single threaded situations. It's so that if you access the variable that you are assigning to from within the initializer then you won't be accessing the "current" object, since the assignment won't have happened yet. In a multithreaded world the operations can be observed to happen out of order from another thread, so it doesn't help. That, and there is no object initializer here, just a constructor.Rushton
I know, I just wanted to find that link to see what it said because I seemed to recall it mentioning something about thread-safety, that's all. I still want to see a definitive source that says why the compiler or the cpu is or isn't allowed to reorder the writes so that value is observed to have changed before the fields have, though.Gerrigerrie
Note that regardless of the underlying memory model, you will almost always want to synchronize object construction and destruction externally in a real world code base. This is a conceptual guideline. The reasoning here is that even if you can avoid data-races, the object may still be in a partially constructed state when accessed, meaning the class invariants of the object might not hold. For non-trivial objects, this makes it very hard to reason whether a particular operation will still do the right thing on such a partially constructed object.Bonehead
S
14

Code as written will work starting from CLR2.0 as the CLR2.0 memory model guarantees that All stores have release semantics.

Release semantics: Ensures no load or store that comes before the fence will move after the fence. Instructions after it may still happen before the fence.(Taken from CPOW Page 512).

Which means that constructor initialization cannot be moved after the assignment of the class reference.

Joe duffy mentioned this in his article about the very same subject.

Rule 2: All stores have release semantics, i.e. no load or store may move after one.

Also Vance morrison's article here confirms the same(Section Technique 4: Lazy Initialization).

Like all techniques that remove read locks, the code in Figure 7 relies on strong write ordering. For example, this code would be incorrect in the ECMA memory model unless myValue was made volatile because the writes that initialize the LazyInitClass instance might be delayed until after the write to myValue, allowing the client of GetValue to read the uninitialized state. In the .NET Framework 2.0 model, the code works without volatile declarations.

Writes are guaranteed to happen in order starting from CLR 2.0. It is not specified in ECMA standard, it is just the microsoft implementation of the CLR gives this guarantee. If you run this code in either CLR 1.0 or any other implementation of CLR, your code is likely to break.

Story behind this change is:(From CPOW Page 516)

When the CLR 2.0 was ported to IA64, its initial development had happened on X86 processors, and so it was poorly equipped to deal with arbitrary store reordering (as permitted by IA64) . The same was true of most code written to target .NET by nonMicrosoft developers targeting Windows

The result was that a lot of code in the framework broke when run on IA64, particularly code having to do with the infamous double-checked locking pattern that suddenly didn't work properly. We'll examine this in the context of the pattern later in this chapter. But in summary, if stores can pass other stores, consider this: a thread might initialize a private object's fields and then publish a reference to it in a shared location; because stores can move around, another thread might be able to see the reference to the object, read it, and yet see the fields while they are still i n an uninitialized state. Not only did this impact existing code, it could violate type system properties such as initonly fields.

So the CLR architects made a decision to strengthen 2.0 by emitting all stores on IA64 as release fences. This gave all CLR programs stronger memory model behavior. This ensures that programmers needn' t have to worry about subtle race conditions that would only manifest in practice on an obscure, rarely used and expensive architecture.

Note Joe duffy says that they strengthen 2.0 by emitting all stores on IA64 as release fences which doesn't mean that other processors can reorder it. Other processors itself inherently provides the guarantee that store-store(store followed by store) will not be reordered. So CLR doesn't need to explicitly guarantee this.

Scandalmonger answered 19/3, 2015 at 14:43 Comment(12)
So basically, my comments to the other answers here asking for a definitive source on why this (potential) reordering cannot occur is this rule 2. That sounds good.Gerrigerrie
In the linked article from Joe's article - msdn.microsoft.com/en-gb/magazine/cc163715.aspx - the same rule would be rule 4: Writes cannot move past other writes from the same thread.Gerrigerrie
@LasseV.Karlsen The Microsoft implementation has a stricter model than is specified by the C# specs. (Because all of the processors that it's written to work on have a stricter memory model than is seen in processors.) Other implementations of the language can (and do) have a more lax memory model than the MS implementation.Rushton
@LasseV.Karlsen Yes, .NET is a library. You can run the code in that library on any runtime that you want, whether it be the Microsoft runtime or say Mono. .NET needs to be written under the assumption that it's executed by any valid C# runtime, not just specifically the Microsoft implementation.Rushton
@LasseV.Karlsen Igor's article talks about memory model of ECMA standard C#. Not the clr. In Joe's book he says the story also. Let me update the answer with the story :)Scandalmonger
I didn't read it thoroughly enough, let me delete my comments, they're rubbish.Gerrigerrie
It is unfortunate that synchronization cost caused by this CLR2.0 guarantee is payed by all code, yet typically only <1% of code deals with threading at all. I never understood why memory models should be strong by default. Totally tilted trade-off.Nibelung
@Nibelung The have chosen to implement it just because of the fact that prior to .net 2.0, .Net framework code itself assumes that write followed by write always happens in order(because it was guaranteed by intel x86, x64 and amd processors), that assumption leads to break lot of .Net Framework code. So instead finding all potential bugs and fixing it, they took the easy way to strengthen the memory model. Also this is one of the reason why people call .Net is poor performing/slow :)Scandalmonger
So is there any difference in the CLR between volatile writes and regular writes?Coltish
@Coltish I guess no(I'm not sure if CLR works differently for volatile and non volatile). But if there is read involved before or after it, then it matters.Scandalmonger
@SriramSakthivel: sorry, can you explain why a volatile write is different than a regular write if followed by a read? From your post it seems like they would be the sameColtish
@Coltish No there is no different AFAIK. Maybe I worded it wrong. I was referring to this. Volatile/Non volatile write followed by volatile/non volatile read can be reordered.Scandalmonger
I
1

The code described above is thread safe. The constructor is fully executed before it is assigned to the "value" variable. The local copy in the second loop will either be null or a fully constructed instance as assigning an instance reference is an atomic operation in memory.

If "value" was a structure then it would not be thread safe as the initialization of value wouldn't be atomic.

Interpret answered 19/3, 2015 at 14:0 Comment(21)
What guarantees that the constructor has fully executed before the reference is stored into the variable? Isn't this just a sequence of writes that can be reordered?Gerrigerrie
The assignment to the "value" variable only takes place after the constructor is finished. Consider the code "new MyClass().A", the class must be valid before reading the property. It is no different for the assignment.Interpret
I'm not so sure. Consider this hypothetical break-down of the code, where the constructor is inlined: var temp = allocateMemory(); temp.a = 1; temp.b = 1; value = temp; Why doesn't the memory model allow reordering here so that value = temp executes before or between the other two?Gerrigerrie
@JohnTaylor That assumption is only valid in a single threaded context. A single thread cannot observe the object reference being assigned before the constructor finishes, but a second thread most certainly can.Rushton
The optimizer is REALLY smart about these details and very well might reorder the temp.a and temp.b, assignments. Since the assignment to value includes effects outside the local scope of the function block, the optimizer will not move the value assignment relative to the allocation and variable assignments.Interpret
@JohnTaylor There is nothing in the optimizer that prevents it from reordering operations outside of the scope of a single method, so long as the order of everything is deterministic in a single threaded model, which this is not.Rushton
@Rushton You are incorrect. The assignment to value ONLY occurs on one thread and it is atomic AFTER the constructor. The reading of the variable will either be null, old, or current depending on which way the wind blows.Interpret
@JohnTaylor That will be true for the thread creating the object. it doesn't need to be true at all for the second thread. There are no such guarantees about ordering for what the second thread observes.Rushton
@Servy, the copy that the second thread sees is either null or fully constructed. There is never a half baked version. Now value = new MyClass(); value.A = 1; value.B = 1; is very unsafe.Interpret
Where is the guarantee listed that the compiler cannot reorder those writes into that last example?Gerrigerrie
@Lasse V. Karlsen The last example with the assignments outside the constructor that I gave is not thread safe and the compiler might reorder the code.Interpret
@JohnTaylor That isn't an answer to his question. You're asserting that the runtime guarantees that the example in the OP won't be reordered. Where does it guarantee this? Writes can be observed to be arbitrarily reordered from other threads, so as far as the second thread is concerned, your example and the OP's are identical code. In both cases you have three writes to three different variables, and no restrictions on what order they must be observed to happen from the second thread.Rushton
I understand that, but is this guarantee written down anywhere? This is probably the fourth or fifth time this particular type of question has surfaced here on SO and it always sparks a debate, leaving (in the cases I've seen) two answers: it is a problem, and it is not a problem. I am not saying you're wrong, and please forgive me if it sounds like I don't trust you but it would be very nice to have something definitive to point to for once. Is the guarantee written down anywhere? Can I go somewhere and read about this?Gerrigerrie
@Lasse V. Karlsen I have never seen it written down, but all optimizer operate on the principal that they can make any changes that does not effect the outcome. Consider '1 + 2 * 3 == 7' the polish notation for this is '1 2 3 * + 7 ==' the optimizer might change it to '2 3 * 1 + 7 ==' or even '7 3 2 * 1 + ==' but the outcome is always true. If the optimizer moved code that effected other code it might generate '1 2 * 3 + 7 ==' which is false and incorrect. So the outcome, including assignments outside of scope, must be guaranteed. The order that it gets there isn't.Interpret
That may be so but this MSDN article on the subject of the C# memory model - msdn.microsoft.com/en-us/magazine/jj863136.aspx - specifically says this: "One source of complexity in multithreaded programming is that the compiler and the hardware can subtly transform a program’s memory operations in ways that don’t affect the single-threaded behavior, but might affect the multithreaded behavior."Gerrigerrie
@JohnTaylor It is guaranteed in the scope of a single thread. In a multithreaded environment one must use memory barriers in order to have such a guarantee. Basically the only things you can rely on in a multithreaded environment is surrounding a handful of synchronization primitives and memory barriers. Other than that you have to more or less assume everything can be arbitrarily reordered.Rushton
I have tried to provoke a problem with this kind of code many times before, and did so even now, unable to do so. But, there may be a problem executing the C# code on a different x64 processor or x86 processor with a different JITter, so the fact that I personally cannot observe a problem with the code does not mean that there's no problem with it. I too think the code is OK. I, however, cannot say it is definitively OK.Gerrigerrie
@LasseV.Karlsen The x86 memory model has a stricter limitation on the reordering of operations than what C# guarantees, so on those processors these types of reorderings won't happen, but on other processors they most certainly can.Rushton
I fired off an email to Joe Duffy, hope he takes the time to answer a random stranger :)Gerrigerrie
@LasseV.Karlsen If you get reply from Joe, don't forget to share the info with us :)Scandalmonger
Of course but your answer and the articles you link to pretty much sums it up it seems. If he takes the time to read the question and check over the answers perhaps he won't need to answer my email, but we'll see. If I get a reply, I'll of course post it here, unless he answers by answering here or commenting here directly, that would be best if you ask me :) He probably gets many emails each day though so I'm not expecting him to have the time, but again, we'll see.Gerrigerrie
R
1

Thus, it seems like this could fail because the write to the value field might happen before the writes to a and b. Is this possible

Yes, it most certainly is possible.

You'll need to synchronize access to the data in some way to prevent such a reordering.

Rushton answered 19/3, 2015 at 14:14 Comment(2)
The assignment DOES NOT occur until after the constructor executesInterpret
@JohnTaylor In a single threaded model the world must be observed to act that way, but this is not a single threaded model, so no, there is no such guarantee.Rushton
S
0

will I ever see this (or other similar multi-threaded code) throw an exception?


Yes, on ARM (and any other hardware with weak memory model) you will observe such behavior.

I often see reference to the fact that non-volatile writes might not immediately be seen by other threads. Thus, it seems like this could fail because the write to the value field might happen before the writes to a and b. Is this possible, or is there something in the memory model that makes this (quite common) pattern safe?

Volatile is not about instantaneousness of changes observation, it's about order and acquire/release semantics.
Moreover, ECMA-335 say that it can happen (and it will happen on ARM or any other hardware with weak memory model).

Does readonly matter for this purpose?

readonly has nothing to do with instructions reordering and volatile.

Would it matter if a and b were a type that can't be atomically written (e. g. a custom struct)?

Atomicity of fields doesn't matter in this scenario. To prevent that situation you should write reference to created object via Volatile.Write (or just make that reference volatile and compiler will do the job). Volatile.Write(ref value, new MyClass(1, 1)) will do the trick.

For more information on volatile semantics and memory model see ECMA-335, section I.12.6

Shoup answered 28/7, 2017 at 12:36 Comment(0)
D
0

As written, this code is threadsafe because value is not updated until constructor has finished executing. In other words, the object under construction is not observed by anyone else.

You can write code which helps you shoot yourself in the face by explicitly publishing this to the outside world like

class C { public C( ICObserver observer ) { observer.Observe(this); } }

When Observe() executes, all bets are off because it no longer holds true that object is not observed by outside world.

Doorbell answered 30/7, 2017 at 18:36 Comment(0)
C
-2

This was wrong, sorry...

I think you could throw an error if your test if executed before the first assignment of the variable in the other thread. This would be a race condition... it may be an intermittent problem.

You could also get an error if the while loop checks the values at the exact moment that a new class is instantiated and assigned to value, but before the a and b variables are set.

As for a better way to do this, it would depend on your goal. Is there a reason value keeps getting overwritten? I would think it more common that the new classes would be placed into a collection which need to be processed in order.

There are collections that would handle this. You would add classes to the collection in one thread, and check and extract them in the other.

See http://dotnetcodr.com/2014/01/14/thread-safe-collections-in-net-concurrentstack/ for an example.

Chastain answered 19/3, 2015 at 13:59 Comment(2)
There is not a race condition as the reference in value is always null or fully constructed. The only gotcha is that with multiple cores the reference will very likely be an older copy, but it will be a valid copy.Interpret
You're correct, sorry.... I still recommend looking at concurrentstack for the proper way to handle these things.Chastain

© 2022 - 2024 — McMap. All rights reserved.