Can subsequent writes in .NET be reordered by the runtime or the processor?

Asked 27/8, 2014 at 6:4 Answered 27/8, 2014 at 9:51

I have immutable objects whose hashcode I wish to calculate lazily. I've implemented

private bool _HasHashCode = false;
private int _HashCode;
public override int GetHashCode()
{
    if (_HasHashCode)
        return _HashCode;

    long hashCode;
    unchecked
    {
        hashCode = Digits;
        hashCode = (hashCode*397) ^ XI;
        hashCode = (hashCode*397) ^ YI;
        hashCode = (int) ( hashCode % Int32.MaxValue);
    }

    // is it possible that these two write instructions
    // get reordered on a certain .NET/CPU architecture 
    // combination:

    _HashCode = (int)hashCode;
    _HasHashCode = true;

    return _HashCode;
}

My reasoning is that the 32 bit _HashCode member is 32 bit and writes to it are atomic so even if the calculation is run twice due to a race condition on setting the _HasHashCode property it doesn't matter as the same value will be calculated each time.

My worry is that the CLR might reorder the write to _HashCode and _HasHashCode. Is this a concern or can I be sure the CLR doesn't reorder writes?

Sammy answered 27/8, 2014 at 6:4 Comment(6)

My knowledge of this sort of thing is not very good, but I'd suggest you put "volatile" on the two private variables. – Personalize 27/8, 2014 at 6:12

Just use 0 as special value that never show up in your hash code - so you can have just one variable... Side note: your object is not really immutable if you can update hashcode later. Computing it at the moment of freezing the object will be easier to reason about and may not cost much... – Rundle 27/8, 2014 at 6:36

@AlexeiLevenkov: "Computing it at the moment of freezing the object will be easier to reason about and may not cost much." I'm thinking that will depend a lot on how many of these objects get created and destroyed as compared to how often one of them gets put in a hash table. If 10 million of them are created (and destroyed) without GetHashCode() getting called on any of them, then 10 million hash codes will have been computed unnecessarily. But only the OP knows the ratio of object creation to usage of GetHashCode(). – Personalize 27/8, 2014 at 7:20

Just calculate the hashcode at the moment you create the object and be done with it. Your hash algorithm isn't computationally expensive and it'll take care of any concerns. – Oas 27/8, 2014 at 8:48

This question appears to be off-topic because it belongs to codereview.stackexchange.com – Joellejoellen 27/8, 2014 at 9:7

@Personalize Good point... I'm sure I did not stress "may not cost much" part of my comment enough... I generally prefer correct code over fast and possibly correct one - if I need to spend 2 days trying to figure out if my code is correct it is most likely not a good investment... – Rundle 27/8, 2014 at 17:59

There's a lazy approach here: avoid the issue and avoid the question. For example, re-ordering is only a concern if there are two "things" - one "thing" can never be out of order. You could sacrifice the sentinel value of 0 to mean "not yet calculated" - then as the last step of the calculation, avoid the sentinel:

int hash;
public override int GetHashCode()
{
    var snapshot = hash;
    if(snapshot == 0) // means: not yet calculated
    {
        // snapshot = ... your actual implementation

        if(snapshot == 0) snapshot = -124987; // avoid sentinel value
        hash = snapshot;
    }
    return snapshot;
}

Note that int reads and writes are guaranteed to be atomic, which also helps.

Vergne answered 27/8, 2014 at 9:51 Comment(7)

+1 That indeed simplifies the whole thing. But the question is an interesting one, because there are very few resources about how the memory model really works in newer .NET versions. – Cutback 27/8, 2014 at 10:0

Instead of testing for zero, how about simply or'ing a one bit into the value? That reduces the hashiness by a factor of two, but does that matter? For any practical use the hash code is going to be run through a modulo operation or something to turn it into an index. – Personalize 27/8, 2014 at 19:52

@Rennie for what purpose? – Vergne 27/8, 2014 at 21:53

@rennie how so? Testing for zero is a single op (not counting the ldfld); testing for a specific bit is at least 3, the third of those being the same op we would have done in the first place. It is a micro-unoptimization. – Vergne 27/8, 2014 at 21:59

No, I mean just always or in a one bit without testing or anything. But never mind, it's a dumb suggestion. – Personalize 27/8, 2014 at 22:0

Replace "if(snapshot == 0) snapshot = -124987;" with "snapshot |= 1;". – Personalize 27/8, 2014 at 22:3

@Rennie ah, right; got it - yes, that could work - probably not important compared to the other work that went on in computing the hash: if that was cheap, we wouldn't be caching it. – Vergne 27/8, 2014 at 22:9

No it is NOT threadsafe, because of the concern you mentioned: writes can be reordered by the JIT compiler.

This is confirmed in this MSDN article about the CLR memory model (in the very first couple of paragraphs). (Also see part two of the article.)

The solution is not to use volatile. Rather, you should use Thread.MemoryBarrier() as follows:

_HashCode = (int)hashCode;
Thread.MemoryBarrier(); // Prevents reordering of the statements before and after.
_HasHashCode = true;

A MemoryBarrier has precisely the semantics you need for this code.

However, note that according to microsoft:

MemoryBarrier is required only on multiprocessor systems with weak memory ordering (for example, a system employing multiple Intel Itanium processors).

Also, I'm not totally convinced that it would be any faster doing it like this rather than caching the hash code from the constructor (and thereby removing all logic from the GetHashCode() implementation).

I would certainly try some careful timings with both approaches to be sure.

Isleana answered 27/8, 2014 at 7:45 Comment(9)

But, the object is immutable, so I don't see it should make any difference. – Cutback 27/8, 2014 at 7:47

@Groo If the writes were reordered, it would be possible that one thread could have set _HasHashCode to true before it updated _HashCode. Meanwhile another thread could read _HasHashCode and see that it is true, and proceed to use _HashCode before the first thread had updated it. This would result in the second thread using an incorrect hash code (which would have the default value of 0). – Isleana 27/8, 2014 at 7:50

But I am pretty sure that stores are never reordered with other stores (Intel® 64 and IA-32 Architectures Software Developer’s Manual, chapter 8.2.3.2 "Neither Loads Nor Stores Are Reordered with Like Operations". – Cutback 27/8, 2014 at 7:53

@Groo The MS article I linked explicitly gives an example that shows writes can be reordered. However, the processors on which this can cause problems are limited - most modern ones won't give problems. Microsoft say "MemoryBarrier is required only on multiprocessor systems with weak memory ordering (for example, a system employing multiple Intel Itanium processors)." – Isleana 27/8, 2014 at 7:53

@Groo Igor Ostrovsky (the author) is a software developer at Microsoft's Parallel Computing Platform team. I guess he might know about this kind of stuff... However, there is further discussion in this blog which might be relevant. – Isleana 27/8, 2014 at 8:2

Thanks for the insight. It's new information for me (at least in the Intel CPU range), so I'll have something to think about now. :) – Cutback 27/8, 2014 at 8:16

@Groo There is further detailed information in the second part of the MSDN article I linked which might also be relevant. – Isleana 27/8, 2014 at 8:16

It's also strange that the author of that article says that using volatile is enough to prevent reordering, in contrast to your answer. – Cutback 27/8, 2014 at 8:49

@Groo MemoryBarrier is better than volatile in this example because it introduces less overhead and it more precisely targets the required behaviour in just the area of code that it matters. (E.g see this blog by Brad Abrams) – Isleana 27/8, 2014 at 9:18

Edit: @Groo got my attention focused on reordering instructions either by underlying framework (CLR could do that) or the OS. I believed that lock blocks prevents this and according to this they do prevent reordering instructions. Another source is this one which states "Monitor.Enter and Monitor.Exit both generate full fences".

It's not thread-safe; but first the my proposition:

private bool _HasHashCode = false;
private int _HashCode;
private readonly object _lock = new object();

public override int GetHashCode()
{
    if (_HasHashCode)
        return _HashCode;

    lock (_lock)
    {
        if (_HasHashCode)
            return _HashCode;

        long hashCode;
        unchecked
        {
            hashCode = Digits;
            hashCode = (hashCode*397) ^ XI;
            hashCode = (hashCode*397) ^ YI;
            hashCode = (int) (hashCode%Int32.MaxValue);
        }

        _HashCode = (int) hashCode;
        _HasHashCode = true;
        return _HashCode;
    }
}

One problem in parallel/async programming I encounter most of the time is the "Is That Job Already Done?". This code takes care of that. lock statement is pretty fast and it would hit just a couple of times (and hash code would not get re-calculated!). Hash code will be calculated just at the first lock. The following that comes (if you are creating this object very fast over and over) would just come and see _HasHashCode is true and just return it.

The good part is, other than some initial objects that are created at first; none of late comers would hit the lock! So that lock block just hits a couple of times (at most).

Note: I was hasty in answering. I should ask: If this object is immutable, why not calculating the hash at construction time? :)

York answered 27/8, 2014 at 8:33 Comment(10)

This doesn't solve OP's problem: hash calculation is neither thread-unsafe (the object is immutable), nor slow. The question is whether writes to _HashCode and _HasHashCode can be reordered to return inconsistent results. – Cutback 27/8, 2014 at 8:36

Reordering does not matter here, because it's just the first call (whomever it is) would calculate the hash. The rest would just consume the calculated hash. – York 27/8, 2014 at 8:39

Unless _HasHashCode somehow gets set to true while _HashCode is still zero. – Cutback 27/8, 2014 at 8:41

Assume we called a 1000 times this method in Parallel. All those calls would be queued behind the lock because _HasHashCode is false at this point. After the first one done with the lock, the rest of 999 calls would come in and see _HasHashCode is set to true; so they just return the hash. Any further calls to this method would not even hit the lock statement. – York 27/8, 2014 at 8:44

But since the first if clause is outside the lock, if HasHashCode is true, any thread reading that value will return whatever is in _HashCode. So, if that value happens to be zero at that moment (due to reordering), you could potentially have 999 threads return zero, while the first one them finishes updating _HashCode inside the lock. – Cutback 27/8, 2014 at 8:52

@Groo I am not about to convince you here but to learn; so maybe I am just way out of the main line. But at the end of lock block, first comes _HashCode = (int) hashCode; and then comes _HasHashCode = true; so (since none of them are volatile; too); _HasHashCode would never be true before _HashCode value is set. – York 27/8, 2014 at 8:57

Until now, I thought that writes could never be reordered with other writes (although I was aware of other reordering possibilities). However, the OP is asking whether such reordering is possible, and @Matthew's answer and the link he provided claims that it's actually possible. While it's certainly not possible on plain x86/IA32/x64 architectures, it is possible on Intanium and various other architectures. – Cutback 27/8, 2014 at 9:19

+1, indeed, writes cannot move after exiting a lock. – Cutback 27/8, 2014 at 9:42

Adding a lock object is not trivial; this could have significant overhead – Vergne 27/8, 2014 at 9:52

That lock block just hits a couple of times (at most). That should not be significant overhead. – York 27/8, 2014 at 10:21

To add to other answers, here is a table which shows possible reorderings on different architectures:

Comparison of possible instruction reorderings on different architectures

_{(credit: Linux Journal, Memory Ordering in Modern Microprocessors by Paul E. McKenney)}

Regarding Intel architectures and the OP's question, it shows that:

stores cannot be reordered with other stores on x86 (this includes IA-32 and Intel64, or Intel's implementation with x86-64, not to be confused with IA-64/Itanium),
but stores can be reordered with other stores on IA-64 (Itanium) processors.

On the other hand, according to this link, .NET (since 2.0) should ensure that out-of-order writes never happen (even on such architectures):

On .NET (...) this kind of code motion and processor reordering is not legal. This specific example was a primary motivation for the strengthening changes we made to the .NET Framework 2.0’s implemented memory model in the CLR. Writes always retire in-order. To forbid out-of-order writes, the CLR’s JIT compiler emits the proper instructions on appropriate architectures (i.e. in this case, ensuring all writes on IA64 are store/release).

This MSDN article also explains it:

Strong Model 2: .NET Framework 2.0

The rules for this model (introduced in .NET 2.0) are:

All the rules that are contained in the ECMA model, in particular the three fundamental memory model rules as well as the ECMA rules for volatile.

Reads and writes cannot be introduced.

A read can only be removed if it is adjacent to another read to the same location from the same thread. A write can only be removed if it is adjacent to another write to the same location from the same thread. Rule 5 can be used to make reads or writes adjacent before applying this rule.

Writes cannot move past other writes from the same thread.

Reads can only move earlier in time, but never past a write to the same memory location from the same thread.

Given the fact that Microsoft recently dropped support for Itanium in both Windows Server and Visual Studio, you can basically only target x86/x64 from now on, which have stricter memory models mentioned above, disallowing out-of-order writes.

Of course, since there exist different implementations of Microsoft's .NET (Mono), claims like these should be taken with reserve.

Cutback answered 27/8, 2014 at 9:16 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags