Variable freshness guarantee in .NET (volatile vs. volatile read)
Asked Answered
K

2

18

I have read many contradicting information (msdn, SO etc.) about volatile and VoletileRead (ReadAcquireFence).

I understand the memory access reordering restriction implication of those - what I'm still completely confused about is the freshness guarantee - which is very important for me.

msdn doc for volatile mentions:

(...) This ensures that the most up-to-date value is present in the field at all times.

msdn doc for volatile fields mentions:

A read of a volatile field is called a volatile read. A volatile read has "acquire semantics"; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.

.NET code for VolatileRead is:

public static int VolatileRead(ref int address)
{
    int ret = address;
    MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
    return ret;
}

According to msdn MemoryBarrier doc Memory barrier prevents reordering. However this doesn't seem to have any implications on freshness - correct?

How then one can get freshness guarantee? And is there difference between marking field volatile and accessing it with VolatileRead and VolatileWrite semantic? I'm currently doing the later in my performance critical code that needs to guarantee freshness, however readers sometimes get stale value. I'm wondering if marking the state volatile will make situation different.

EDIT1:

What I'm trying to achieve - get the guarantee that reader threads will get as recent value of shared variable (written by multiple writers) as possible - ideally no older than what is the cost of context switch or other operations that may postpone the immediate write of state.

If volatile or higher level construct (e.g. lock) have this guarantee (do they?) than how do they achieve this?

EDIT2:

The very condensed question should have been - how do I get guarantee of as fresh value during reads as possible? Ideally without locking (as exclusive access is not needed and there is potential for high contention).

From what I learned here I'm wondering if this might be the solution (solving(?) line is marked with comment):

private SharedState _sharedState;
private SpinLock _spinLock = new SpinLock(false);

public void Update(SharedState newValue)
{
    bool lockTaken = false;
    _spinLock.Enter(ref lockTaken);

    _sharedState = newValue;

    if (lockTaken)
    {
        _spinLock.Exit();
    }
}

public SharedState GetFreshSharedState
{
    get
    {
        Thread.MemoryBarrier(); // <---- This is added to give readers freshness guarantee
        var value = _sharedState;
        Thread.MemoryBarrier();
        return value;
    }
}

The MemoryBarrier call was added to make sure both - reads and writes - are wrapped by full fences (same as lock code - as indicated here http://www.albahari.com/threading/part4.aspx#_The_volatile_keyword 'Memory barriers and locking' section)

Does this look correct or is it flawed?

EDIT3:

Thanks to very interesting discussions here I learned quite a few things and I actually was able to distill to the simplified unambiguous question that I have about this topic. It's quite different from the original one so I rather posted a new one here: Memory barrier vs Interlocked impact on memory caches coherency timing

Klan answered 10/7, 2014 at 13:25 Comment(15)
You know that barriers prevent reorderings but do not understand how that implies "freshness". Begin by carefully defining what you mean by "freshness".Jamilajamill
More generally, say what you are trying to achieve. Volatile fields are very low level tools. There is probably a higher level tool you should be using instead.Jamilajamill
@EricLippert Thanks Eric very much for contributing. I edited my question accordingly. If higher tool (lock etc.?) offers guarantee of reading as fresh value as possible, how do they achieve this? I do have hundreds of millions of writes per sec, therefore I want (and need) to understand exactly how to get last written value with as few blocking as possible (therefore I originally preferred volatile reads/writes to lock. And now investigating differences between volatile read/writes, volatile construct and lock construct - not considering the exclusive access, just memory cashes coherence)Klan
If you have hundreds of millions of writes per second then you don't have the latest value no matter what you do. Suppose you do a read. Two instructions get executed on the reading thread and the value is already out of date.Jamilajamill
I note also that a volatile read of one variable may be reordered with respect to a volatile write of another variable; C# volatility does not imply that there is a single correct ordering of all volatile reads and writes. For an example of how that can bite you, see blog.coverity.com/2014/03/26/reordering-optimizationsJamilajamill
To answer your final question: locks do have that guarantee, and how they make that guarantee is an implementation detail of the jitter; the technique varies from CPU to CPU. Typically they take out both read and write fences -- when control enters and leaves the lock body.Jamilajamill
@EricLippert You are correct Eric that I won't get the most recent one at the time when I use the value. However this investigation was started by business case where value read was as much as 70ms old. I'm fine with deviation in order of context switch or so. Instruction reordering wouldn't harm us - we just need as recent single value as possible and make decisions based on that. Stale decisions will harm us badly though.Klan
@EricLippert Thanks for commenting on lock! Do I read it correctly that both - read and write of shared state - needs to be enclosed by MemoryBarriers (before and after)? However than Thread.VolatileRead doc and volatile doc are likely wrong as they both declare guarantee of reading 'most up-to-date' value despite both of those uses just half-fencing. Correct?Klan
@EricLippert Also if I understand it correctly - just marking the shared state volatile would not give the 'freshness' guarantee (since it is just half fencing). I publish my state withing SpinLock.Enter SpinLock.Exit region (both are full fencing due to usage of Interlocked), so adding one MemoryBarier prior the read of shared state (in VolatileRead code pasted in my question) should simulate behavior that is (typically) used by jitter to give freshness guarantee to access within lock region. Or did I miss something?Klan
How is GetFreshSharedState called? Is it being called repeatedly in a loop? Can you post some code? It doesn't have to be complete or anything, just some bits or pieces of how it's used might be useful.Harrow
@BrianGideon Main scenario is that about a dozen of threads reads (very frequent) data from wire and after each read they made a decision to call GetFreshSharedState. Update is called in the very same way.Klan
I don't understand why you believe that a half-fence is insufficient to ensure whatever it is you mean by "freshness". Perhaps this will help: msdn.microsoft.com/en-us/library/windows/hardware/…Jamilajamill
I also note that 70ms is less than five quanta on a machine with 16 ms thread quantum. How do you know that the 70 ms delay is not due to four threads each running on the processor between the read of the shared memory and the code which consumes the value?Jamilajamill
@EricLippert This case happened on machine (x64) with 1ms OS clock resolution and also log file (written from single concurrent queue with events stamped by atomic clock at the time of inserting them to the queue) shows that there was quite numerous activity of other writers threads (including execution of Update) and questioned reader thread between the point of time when the 'stale' value was written by Update and the point of time when reader read it with GetFreshSharedState.Klan
@EricLippert That is what caused me to question half fencing. And then numerous examples like this one from C# 4 in NutshellKlan
H
10

I think this is a good question. But, it is also difficult to answer. I am not sure I can give you a definitive answer to your questions. It is not your fault really. It is just that the subject matter is complex and really requires knowing details that might not be feasible to enumerate. Honestly, it really seems like you have educated yourself on the subject quite well already. I have spent a lot of time studying the subject myself and I still do not fully understand everything. Nevertheless, I will still attempt an some semblance of an answer here anyway.

So what does it mean for a thread to read a fresh value anyway? Does it mean the value returned by the read is guaranteed to be no older than 100ms, 50ms, or 1ms? Or does it mean the value is the absolute latest? Or does it mean that if two reads occur back-to-back then the second is guaranteed to get a newer value assuming the memory address changed after the first read? Or does it mean something else altogether?

I think you are going to have a hard time getting your readers to work correctly if you are thinking about things in terms of time intervals. Instead think of things in terms of what happens when you chain reads together. To illustrate my point consider how you would implement an interlocked-like operation using arbitrarily complex logic.

public static T InterlockedOperation<T>(ref T location, T operand)
{
  T initial, computed;
  do
  {
    initial = location;
    computed = op(initial, operand); // where op is replaced with a specific implementation
  } 
  while (Interlocked.CompareExchange(ref location, computed, initial) != initial);
  return computed;
}

In the code above we can create any interlocked-like operation if we exploit the fact that the second read of location via Interlocked.CompareExchange will be guaranteed to return a newer value if the memory address received a write after the first read. This is because the Interlocked.CompareExchange method generates a memory barrier. If the value has changed between reads then the code spins around the loop repeatedly until location stops changing. This pattern does not require that the code use the latest or freshest value; just a newer value. The distinction is crucial.1

A lot of lock free code I have seen works on this principal. That is the operations are usually wrapped into loops such that the operation is continually retried until it succeeds. It does not assume that the first attempt is using the latest value. Nor does it assume every use of the value be the latest. It only assumes that the value is newer after each read.

Try to rethink how your readers should behave. Try to make them more agnostic about the age of the value. If that is simply not possible and all writes must be captured and processed then you may be forced into a more deterministic approach like placing all writes into a queue and having the readers dequeue them one-by-one. I am sure the ConcurrentQueue class would help in that situation.

If you can reduce the meaning of "fresh" to only "newer" then placing a call to Thread.MemoryBarrier after each read, using Volatile.Read, using the volatile keyword, etc. will absolutely guarantee that one read in a sequence will return a newer value than a previous read.


1The ABA problem opens up a new can of worms.

Harrow answered 12/7, 2014 at 2:31 Comment(3)
Thanks Brian for spending your time on this! Just to quickly define what I mean by fresh: last written value by any writer (irrespective of any caching etc.). The more I read about barriers the more I'm worried that they are not defined for this (they are to prevent types of reordering) - but I might be just miss-leaded. What is very good point is pointing on Interlocked - those operations are from definition guaranteed to get the freshest value. I'm arriving at conclusion that my read property should simply call return Interlocked.CompareExchange(ref _sharedState, null, null);Klan
Yes, Interlocked.CompareExchange will return the latest value. But, by the time your logic uses that value it may not be the latest anymore. That's what I meant when I suggested trying to make the readers less dependent on the value actually being the latest.Harrow
Thanks Brian - your question is currently the best one to my very vague question. I want to leave discussion open for day or two before marking. However I created much more specific question (see edit3). In respect of timing and freshness requirement - it's mainly about statistics and 'unluckiness' - little slower caches updating + few unlucky swaps (+ page misses or whatever else) and slightly older value can suddenly in 1 out of gazillion cases become an ages older value - I need to prevent this possibility as much as possible.Klan
O
1

A memory barrier does provide this guarantee. We can derive the "freshness" property that you are looking for from the reording properties that a barrier guarantees.

By freshness you probably mean that a read returns the value of the most recent write.

Let's say we have these operations, each on a different thread:

x = 1
x = 2
print(x)

How could we possibly print a value other than 2? Without volatile the read can move one slot upwards and return 1. Volatile prevents reorderings, though. The write cannot move backwards in time.

In short, volatile guarantees you to see the most recent value.

Strictly speaking I'd need to differentiate between volatile and a memory barrier here. The latter one is a stronger guarantee. I have simplified this discussion because volatile is implemented using memory barriers, at least on x86/x64.

Organo answered 10/7, 2014 at 14:10 Comment(9)
Thanks for contributing. If 1 is written by thread A at 10:01PM and 2 is written by thread B at 10:10PM - are you guaranteed that read performed by thread C at 10:20PM will get value of 2? No. I don't see this guarantee in any spec - I only see guarantee that you will see correct snapshot of memory in time.Klan
See Jon Skeet comment on this topic and Raymond Chen comment on this topicKlan
On the other hand Thread.VolatileRead MSDN doc mentions "Reads the value of a field. The value is the latest written by any processor in a computer, regardless of the number of processors or the state of processor cache." However I'm occasionally observing contradicting behavior in our toolset. And I can find numerous contradicting info on the web (like the two comments linked above). Confused :|Klan
@Klan indeed reads can reorder above writes according to albahari.com/threading/part4.aspx section "The volatile keyword". This answer is wrong, will delete it shortly. The answer you are looking for is: No, you are not guaranteed the latest value.Organo
This topic is very confusing. It might be beneficial for other readers if you'd leave your answer (which is usually thought as the expected behavior) and just edit it to indicate it is not correctKlan
@Klan I agree but I don't care enough to rewrite it... Post your own answer and I'll upvote it.Organo
From what I saw happening in our toolset I was anticipating this :| That was the reason for starting this question. It should have rather been stated: "How do I get the freshness guarantee"Klan
@Klan right. Ask a new question in addition to this one.Organo
Btw, I'm not really convinced that the delay of 70ms you are seeing is because of this reordering. x86 tends to be very memory coherent. Probably, your reader read a very current value and then was descheduled for 70ms by the OS. Or, a few page faults happened. Nothing you can do about it to fix it always.Organo

© 2022 - 2024 — McMap. All rights reserved.