Can boxing/unboxing a struct in C# give the same effect of it being atomic?
Asked Answered
C

1

5

As per the C# specs, is there any guarantee that foo.Bar would have the same effect of being atomic (i.e. reading foo.Bar from different threads would never see a partially updated struct when written to by different threads)?

I've always assumed that it does. If indeed, I would like to know if the specification guarantees it.

    public class Foo<T> where T : struct
    {
        private object bar;

        public T Bar
        {
            get { return (T) bar; }
            set { bar = value; }
        }
    }

    // var foo = new Foo<Baz>();

EDIT: @vesan This is not the duplicate of Atomic Assignment of Reference Sized Structs. This question asks for the effect of boxing and unboxing whereas the other is about a single reference type in a struct (no boxing / unboxing involved). The only similarities between the two questions are the words struct and atomic (did you actually read the question at all?).

EDIT2: Here's the atomic version based on Raymond Chen's answer:

public class Atomic<T> where T : struct
{
    private object m_Value = default(T);

    public T Value
    {
        get { return (T) m_Value; }
        set { Thread.VolatileWrite(ref m_Value, value); }
    }
}

EDIT3: Revisiting this after 4 years. Turns out that the memory model of CLR2.0+ states that All writes have the effect of volatile write: https://blogs.msdn.microsoft.com/pedram/2007/12/28/clr-2-0-memory-model/

Thus the answer to this question should have been "It is atomic if hardware does not reorder writes", as opposed to Raymond's answer. The JIT and the compiler cannot reorder writes so the "atomic version" based on Raymond's answer is redundant. On weak memory model architectures, the hardware may reorder writes, so you'll need to appropriate acquire/release semantics.

EDIT4: Again, this issue comes down to CLR vs CLI (ECMA) where the latter defines a very weak memory model while the former implements a strong memory model. There's no guarantee that a runtime will do this though, so the answer still stands. However, since the vast majority of code was and still is written for CLR, I suspect anyone trying to create a new runtime will take the easier path and implement strong memory model at the detriment of performance (just my own opinion).

Cousteau answered 23/9, 2015 at 0:3 Comment(8)
possible duplicate of Atomic Assignment of Reference Sized StructsChrismatory
@Chrismatory It's not a duplicate - it appears you don't understand the question at all.Cousteau
@Chrismatory I think it is not correct duplicate as this question is about unboxing of struct being atomic or not, while the suggested duplicate is about whole struct being just reference. Answer is likely the same piece of spec, but with different implementation.Eustache
True, the question is different, but the answer from the linked question lists the relevant documentation and should help answer this question. Feel free to disagree, of course :)Chrismatory
@Chrismatory Go ahead and provide the answer for this question then.Cousteau
I believe the question is exactly the same. Assuming knowledge that should be obvious to those who know what "atomic" means.Subtitle
@Subtitle So you're saying it's also non-atomic? Go ahead and write an answer.Cousteau
@Chrismatory Since you've removed your incorrect answer, would you mind removing the possible duplicate comment too?Cousteau
B
7

No, the result is not atomic. While it's true that the update to the reference is atomic, it is not synchronized. The reference can be updated before the data inside the boxed object becomes visible.

Let's take things apart. A boxed type T is basically something like this:

class BoxedT
{
    T t;
    public BoxedT(T value) { t = value; }
    public static implicit operator T(BoxedT boxed) { return boxed.t; }
}

(Not exactly, but close enough for the purpose of this discussion.)

When you write

bar = value;

this is shorthand for

bar = new BoxedT(value);

Okay, now let's take this assignment apart. There are multiple steps involved.

  1. Allocate memory for a BoxedT.
  2. Initialize the BoxedT.t member with a copy of value.
  3. Save a reference to the BoxedT in bar.

The atomicity of step 3 means that when you read from bar, you will either get the old value or the new value, and not a blend of the two. But it makes no guarantee about synchronization. In particular, operation 3 may become visible to other processors before operation 2.

Suppose the update of bar is visible to another processor, but the initialization of the BoxedT.t is not. When that processor tries to unbox the BoxedT by reading the Boxed.t value, it is not guaranteed to read the full value of t that was written in step 2. It might get only part of the value, and the other part contains default(T).

This is basically the same problem with the double checked locking pattern, but worse because you have no lock at all! The solution is to update bar with release semantics, so that all previous stores are committed to memory before bar is updated. According to C# 4 language spec, section 10.5.3, this can be done by marking bar as volatile. (It also means that all reads from bar will have acquire semantics, which may or may not be what you want.)

Bashful answered 23/9, 2015 at 2:37 Comment(18)
Great answer. Exactly what I was trying to find out!Cousteau
BTW on a platform with strong memory model, marking bar as volatile shouldn't have any effect in this case as stores are always committed in order (e.g. x86/x64), correct?Cousteau
Also, wouldn't set { Thread.VolatileWrite(ref bar, value); } do the trick and yet prevent acquire semantics for get? If so wouldn't that be a better implementation than marking bar as volatile?Cousteau
@ZachSaw Even on a platform with a strong memory model, you need to do the appropriate marking to prevent the compiler from reordering writes, which it is normally permitted to do.Bashful
Ah I just assumed steps 2-3 don't involve the compiler - i.e. the JIT always generates a set of opcodes for the shorthand that does in-order writes for strong memory model platforms since anything else would be less efficient. I know this is no guarantee of course but the point I was trying to raise was that if the JIT behaves as I described (it likely does) then this pattern will appear atomic on x86/x64 platforms.Cousteau
@Zach Reordering writes is a staple of compiler optimizations for over a decade by now (probably much longer). There are good reasons to shuffle things around, say if the earlier write is independent of the later one, but does depend on a preceding read.. You want as much distance between these dependencies as possible.Ringleader
@Ringleader Who's disputing that?Cousteau
@Zach "the JIT always generates a set of opcodes for the shorthand that does in-order writes"?Ringleader
But that has no relevance to what you just said about write reordering. I only said in that particular instance (i.e. shorthand), there's no need to reorder writes and that reordering would be less efficient because of the need to issue an expensive barrier instruction.Cousteau
@Zach I thought you meant that one could remove the volatile on a platform that guarantees strong memory ordering due to that. Misunderstanding, you are right that if volatile is there the compiler will most likely issue the writes in order and avoid the additional barrier.Ringleader
@Ringleader I did mean that but with the additional condition that the compiler generates the code for the shorthand for in-order writes. And as I said I would expect it to anyway because that is the most efficient way to do it on a strong memory model platform.Cousteau
@ZachSaw As noted in the article I linked, the JIT compiler can and does reorder stores in the absence of specific directives forbidding it (such as volatile).Bashful
@RaymondChen Given it a bit more thought and yes it does make sense for the JIT to reorder stores even in the shorthand case as it would be faster if it were not meant to be used across multiple threads.Cousteau
@RaymondChen Sorry for this update but CLR specs say that "Writes cannot move past other writes from the same thread." and it contradicts what you said "the JIT compiler can and does reorder stores in the absence of specific directives forbidding it (such as volatile)." -- which one is correct?Cousteau
@ZachSaw Your source is from 2005. Mine is from 2013. Maybe the rules changed in between. ECMA-334 section 8.1 notes that for asynchronous events, "it is not guaranteed that the observable side effects are visible in the original program order." And section 15.5.4 gives an explicit example of two non-volatile writes crossing each other: "it would be permissible for the store to result to be visible to the main thread after the store to finished, and hence for the main thread to read the value 0 from the field result."Bashful
@RaymondChen I don't think the link you provided contradicts the CLR specs, it says that even on Itanium and ARM writes are never reordered. Itanium: Writes will not be reordered because they are ST.REL. ARM: Writes will not be reordered because DMB is emitted before “_boxedInt = b.”Cousteau
@ZachSaw See updated comment, which citse the CLR spec.Bashful
Hmm I think the root of all evil can be traced back to the same regret Microsoft had with the ECMA specs (CLI). CLI defines a very weak memory model, but it's widely understood in the industry that the CLR implements a strong memory model. Most software out there is designed with the assumption that they will run on CLR.Cousteau

© 2022 - 2024 — McMap. All rights reserved.