C# volatile variable: Memory fences VS. caching
Asked Answered
M

3

12

So I researched the topic for quite some time now, and I think I understand the most important concepts like the release and acquire memory fences.

However, I haven't found a satisfactory explanation for the relation between volatile and the caching of the main memory.

So, I understand that every read and write to/from a volatile field enforces strict ordering of the read as well as the write operations that precede and follow it (read-acquire and write-release). But that only guarantees the ordering of the operations. It doesn't say anything about the time these changes are visible to other threads/processors. In particular, this depends on the time the cache is flushed (if at all). I remember having read a comment from Eric Lippert saying something along the lines of "the presence of volatile fields automatically disables cache optimizations". But I'm not sure what exactly this means. Does it mean caching is completely disabled for the whole program just because we have a single volatile field somewhere? If not, what is the granularity the cache is disabled for?

Also, I read something about strong and weak volatile semantics and that C# follows the strong semantics where every write will always go straight to main memory no matter if it's a volatile field or not. I am very confused about all of this.

Maddux answered 22/6, 2017 at 7:25 Comment(22)
I didn't say that; in fact you are more likely to have read a comment from me saying that any affect on caches caused by volatile is an implementation detail, not a guarantee. The C# specification says what you can expect from volatile; any behaviours beyond the specified ones are implementation details that you cannot rely on.Foveola
The C# specification also notes that there is explicitly not guaranteed to be a globally consistent observable order of reads and writes. For example, two threads can disagree as to whether a volatile read of one variable happened before or after a volatile write to another.Foveola
You are right to be very confused by it. I'm very confused by it too. That's why I never use volatile.Foveola
This is an excellent question; it's baffled me for a while too. My understanding is that memory fences (including the half fences provided by volatile) need to be respected by the memory subsystem; otherwise, there would be little point to them. Thus, an acquire implies that cache is invalidated, whilst a release implies that cache is flushed. But I've seen many insist this isn't the case.Kalong
@Kalong Exactly. So the specification is incomplete without also making a statement about the caching mechanism, right?Maddux
What exactly do you mean by "incomplete"? The C# specification does not say anything about the implementation details of the processors it runs on; how could it? It just says that a conforming implementation must have certain minimal observable behaviours regarding the ordering of certain effects, like volatile writes, thread starts, exceptions, and so on, in multithreaded programs.Foveola
@EricLippert: I see. It is agnostic to the caching, therefore the problem that memory effects are never observed by (other) threads is not one that the spec is "aware" of and for which it needs to provide an answer?Maddux
@MightyNicM: It depends on what you mean by "other" threads. If a thread does a volatile write of a guard variable, and another thread subsequently does a volatile read of the said guard, then it should be guaranteed that the second thread observes any new values written after the guard. For weak architectures that do not provide these guarantees implicitly, .NET generates explicit processor instructions to that effect. The architecture is required to honor the semantics behind these instructions, but not bound to any mechanism for doing so.Kalong
If, by "other threads", you're referring to threads other than those performing the volatile memory access, then there is no guarantee that the effects will be observed. An architecture could, hypothetically, only maintain cache coherence on a pairwise level between threads at synchronization points. Other threads might get updated values too, but this is a side-effect, not a guaranteed behaviour.Kalong
(For what I mean by "guard variable", see the "Publication via Volatile Field" section in C# - The C# Memory Model in Theory and Practice. Perhaps counterintuitively, the volatile keyword only guarantees that the latest value of data written before a volatile write will be observed by another thread after it performs a volatile read. It does not provide guarantees on the value of the volatile variable itself.)Kalong
@Kalong Didn't you mean that any new value written BEFORE the guard is read by the other thread? Also, I don't see the specs saing anything about the freshness of values. It could be that the "reader thread" might always read old values. The only guarantee as I understand it is, that if the reader sees the volatile write, it will also see at least all the writes that happened before it in program order of the writer thread.Maddux
The notion that there is one canonical "fresh" value of a variable is predicated on the mental model that variables are things that have canonical values whose mutations are observably consistently ordered in time. What we're trying to tell you here is that model is simply false.Foveola
Once you've understood the article linked to by Douglas, you can test whether you've internalized the rules by trying to solve the puzzle that I pose here: web.archive.org/web/20160729162225/http://blog.coverity.com/… Even when every variable access is volatile, and some are under locks and there are only a few threads, and you're running on a strong x86 architecture, you can still end up in unexpected situations.Foveola
@MightyNicM: My comment was unclear. I meant that the second thread, when performing reads after the guard (half-fence), would observe any values that had been written by the other thread before the guard.Kalong
@MightyNicM: "The only guarantee as I understand it is, that if the reader sees the volatile write, it will also see at least all the writes that happened before it in program order of the writer thread." – That's exactly what baffles me about the semantics. Without a guarantee of freshness, threads could just ignore memory barriers and keep reading their original stale values indefinitely, as long as all their values are stale.Kalong
My assumption is that memory fences do need to be enforced at the moment in time when they are encountered. If thread A encounters a memory barrier at 1s, and thread B encounters a memory barrier at 2s, then anything written by thread A before 1s must be observed by thread B in reads after 2s.Kalong
@Douglas: I cannot figure out what you mean by your assumption; would you care to apply your set of assumptions to the puzzle I pose in the link above? Under your model, is it ever legal for s and t to both be true? Since it is possible, if your model predicts that it is not, then your model is wrong.Foveola
So by the way, my original question is still not answered imo. ;) So I read the specs, and they say nothing about whether or not a volatile write will EVER be absorved by another thread (volatile read or not). Is that correct or not?Maddux
I had a comment here but I deleted it because it was misleading; I'll expand on it in an answer.Foveola
@EricLippert: Yes, it would be legal for both variables in your example to be true, since no fence or half-fence is generated between the volatile write and the volatile read.Kalong
To exemplify my assumption: Suppose that guard is volatile and data isn't. Thread A writes to data and then to guard. One second later, thread B reads guard and then data. Acquire–release semantics only guarantee that thread B cannot observe the new value of guard and the old value of data. Rather, it may observe new–new, old–old, or old–new.Kalong
However, shouldn't the semantics also forbid thread B from observing the old value of data at all (irrespective of which value of guard is observed), given that the half-fence generated by the volatile read of guard occurs chronologically after the half-fence generated by the volatile write of guard?Kalong
D
21

I'll address the last question first. Microsoft's .NET implementation has release semantics on writes1. It's not C# per se, so the same program, no matter the language, in a different implementation can have weak non-volatile writes.

The visibility of side-effects is regarding multiple threads. Forget about CPUs, cores and caches. Imagine, instead, that each thread has a snapshot of what is on the heap that requires some sort of synchronization to communicate side-effects between threads.

So, what does C# say? The C# language specification (newer draft) says fundamentally the same as the Common Language Infrastructure standard (CLI; ECMA-335 and ISO/IEC 23271) with some differences. I'll talk about them later on.

So, what does the CLI say? That only volatile operations are visible side-effects.

Note that it also says that non-volatile operations on the heap are side-effects as well, but not guaranteed to be visible. Just as important2, it doesn't state they're guaranteed to not be visible either.

What exactly happens on volatile operations? A volatile read has acquire semantics, it precedes any following memory reference. A volatile write has release semantics, it follows any preceding memory reference.

Acquiring a lock performs a volatile read, and releasing a lock performs a volatile write.

Interlocked operations have acquire and release semantics.

There's another important term to learn, which is atomicity.

Reads and writes, volatile or not, are guaranteed to be atomic on primitive values up to 32 bits on 32-bit architectures and up to 64 bits on 64-bit architectures. They're also guaranteed to be atomic for references. For other types, such as long structs, the operations are not atomic, they may require multiple, independent memory accesses.

However, even with volatile semantics, read-modify-write operations, such as v += 1 or the equivalent ++v (or v++, in terms of side-effects) , are not atomic.

Interlocked operations guarantee atomicity for certain operations, typically addition, subtraction and compare-and-swap (CAS), i.e. write some value if and only if the current value is still some expected value. .NET also has an atomic Read(ref long) method for integers of 64 bits which works even in 32-bit architectures.

I'll keep referring to acquire semantics as volatile reads and release semantics as volatile writes, and either or both as volatile operations.

What does this all mean in terms of order?

That a volatile read is a point before which no memory references may cross, and a volatile write is a point after which no memory references may cross, both at the language level and at the machine level.

That non-volatile operations may cross to after following volatile reads if there are no volatile writes in between, and cross to before preceding volatile writes if there are no volatile reads in between.

That volatile operations within a thread are sequential and may not be reordered.

That volatile operations in a thread are made visible to all other threads in the same order. However, there is no total order of volatile operations from all threads, i.e. if one threads performs V1 and then V2, and another thread performs V3 and then V4, then any order that has V1 before V2 and V3 before V4 can be observed by any thread. In this case, it can be either of the following:

  • V1 V2 V3 V4 V1 V2 V3 V4

  • V1 V3 V2 V4 V1 V3 V2 V4

  • V1 V3 V4 V2 V1 V3 V4 V2

  • V3 V1 V2 V4 V3 V1 V2 V4

  • V3 V1 V4 V2 V3 V1 V4 V2

  • V3 V4 V1 V2 V3 V4 V1 V2

That is, any possible order of observed side-effects are valid for any thread for a single execution. There is no requirement on total ordering, such that all threads observe only one of the possible orders for a single execution.

How are things synchronized?

Essentially, it boils down to this: a synchronization point is where you have a volatile read that happens after a volatile write.

In practice, you must detect if a volatile read in one thread happened after a volatile write in another thread3. Here's a basic example:

public class InefficientEvent
{
    private volatile bool signalled = false;

    public Signal()
    {
        signalled = true;
    }

    public InefficientWait()
    {
        while (!signalled)
        {
        }
    }
}

However generally inefficient, you can run two different threads, such that one calls InefficientWait() and another one calls Signal(), and the side-effects of the latter when it returns from Signal() become visible to the former when it returns from InefficientWait().

Volatile accesses are not as generally useful as interlocked accesses, which are not as generally useful as synchronization primitives. My advice is that you should develop code safely first, using synchronization primitives (locks, semaphores, mutexes, events, etc.) as needed, and if you find reasons to improve performance based on actual data (e.g. profiling), then and only then see if you can improve.

If you ever reach high contention for fast locks (used only for a few reads and writes without blocking), depending on the amount of contention, switching to interlocked operations may either improve or decrease performance. Especially so when you have to resort to compare-and-swap cycles, such as:

var currentValue = Volatile.Read(ref field);
var newValue = GetNewValue(currentValue);
var oldValue = currentValue;
var spinWait = new SpinWait();
while ((currentValue = Interlocked.CompareExchange(ref field, newValue, oldValue)) != oldValue)
{
    spinWait.SpinOnce();
    newValue = GetNewValue(currentValue);
    oldValue = currentValue;
}

Meaning, you have to profile the solution as well and compare with the current state. And be aware of the A-B-A problem.

There's also SpinLock, which you must really profile against monitor-based locks, because although they may make the current thread yield, they don't put the current thread to sleep, akin to the shown usage of SpinWait.

Switching to volatile operations is like playing with fire. You must make sure through analytical proof that your code is correct, otherwise you may get burned when you least expect.

Usually, the best approach for optimization in the case of high contention is to avoid contention. For instance, to perform a transformation on a big list in parallel, it's often better to divide and delegate the problem to multiple work items that generate results which are merged in a final step, rather than having multiple threads locking the list for updates. This has a memory cost, so it depends on the length of the data set.


What are the differences between the C# specification and the CLI specification regarding volatile operations?

C# specifies side-effects, not mentioning their inter-thread visibility, as being a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception.

C# specifies critical execution points at which these side-effects are preserved between threads: references to volatile fields, lock statements, and thread creation and termination.

If we take critical execution points as points where side-effects become visible, it adds to the CLI specification that thread creation and termination are visible side-effects, i.e. new Thread(...).Start() has release semantics on the current thread and acquire semantics at the start of the new thread, and exiting a thread has release semantics on the current thread and thread.Join() has acquire semantics on the waiting thread.

C# doesn't mention volatile operations in general, such as performed by classes in System.Threading instead of only through using fields declared as volatile and using the lock statement. I believe this is not intentional.

C# states that captured variables can be simultaneously exposed to multiple threads. The CIL doesn't mention it, because closures are a language construct.


1.

There are a few places where Microsoft (ex-)employees and MVPs state that writes have release semantics:

In my code, I ignore this implementation detail. I assume non-volatile writes are not guaranteed to become visible.


2.

There is a common misconception that you're allowed to introduce reads in C# and/or the CLI.

However, that is true only for local arguments and variables.

For static and instance fields, or arrays, or anything on the heap, you cannot sanely introduce reads, as such introduction may break the order of execution as seen from the current thread of execution, either from legitimate changes in other threads, or from changes through reflection.

That is, you can't turn this:

object local = field;
if (local != null)
{
    // code that reads local
}

into this:

if (field != null)
{
    // code that replaces reads on local with reads on field
}

if you can ever tell the difference. Specifically, a NullReferenceException being thrown by accessing local's members.

In the case of C#'s captured variables, they're equivalent to instance fields.

It's important to note that the CLI standard:

  • says that non-volatile accesses are not guaranteed to be visible

  • doesn't say that non-volatile accesses are guaranteed to not be visible

  • says that volatile accesses affect the visibility of non-volatile accesses

But you can turn this:

object local2 = local1;
if (local2 != null)
{
    // code that reads local2 on the assumption it's not null
}

into this:

if (local1 != null)
{
    // code that replaces reads on local2 with reads on local1,
    // as long as local1 and local2 have the same value
}

You can turn this:

var local = field;
local?.Method()

into this:

var local = field;
var _temp = local;
(_temp != null) ? _temp.Method() : null

or this:

var local = field;
(local != null) ? local.Method() : null

because you can't ever tell the difference. But again, you cannot turn it into this:

(field != null) ? field.Method() : null

I believe it was prudent in both specifications stating that an optimizing compiler may reorder reads and writes as long as a single thread of execution observes them as written, instead of generally introducing and eliminating them altogether.

Note that read elimination may be performed by either the C# compiler or the JIT compiler, i.e. multiple reads on the same non-volatile field, separated by instructions that don't write to that field and that don't perform volatile operations or equivalent, may be collapsed to a single read. It's as if a thread never synchronizes with other threads, so it keeps observing the same value:

public class Worker
{
    private bool working = false;
    private bool stop = false;

    public void Start()
    {
        if (!working)
        {
            new Thread(Work).Start();
            working = true;
        }
    }

    public void Work()
    {
        while (!stop)
        {
            // TODO: actual work without volatile operations
        }
    }

    public void Stop()
    {
        stop = true;
    }
}

There's no guarantee that Stop() will stop the worker. Microsoft's .NET implementation guarantees that stop = true; is a visible side-effect, but it doesn't guarantee that the read on stop inside Work() is not elided to this:

    public void Work()
    {
        bool localStop = stop;
        while (!localStop)
        {
            // TODO: actual work without volatile operations
        }
    }

That comment says quite a lot. To perform this optimization, the compiler must prove that there are no volatile operations whatsoever, either directly in the block, or indirectly in the whole methods and properties call tree.

For this specific case, one correct implementation is to declare stop as volatile. But there are more options, such as using the equivalent Volatile.Read and Volatile.Write, using Interlocked.CompareExchange, using a lock statement around accesses to stop, using something equivalent to a lock, such as a Mutex, or Semaphore and SemaphoreSlim if you don't want the lock to have thread-affinity, i.e. you can release it on a different thread than the one that acquired it, or using a ManualResetEvent or ManualResetEventSlim instead of stop in which case you can make Work() sleep with a timeout while waiting for a stop signal before the next iteration, etc.


3.

One significant difference of .NET's volatile synchronization compared to Java's volatile synchronization is that Java requires you to use the same volatile location, whereas .NET only requires that an acquire (volatile read) happens after a release (volatile write). So, in principle you can synchronize in .NET with the following code, but you can't synchronize with the equivalent code in Java:

using System;
using System.Threading;

public class SurrealVolatileSynchronizer
{
    public volatile bool v1 = false;
    public volatile bool v2 = false;
    public int state = 0;

    public void DoWork1(object b)
    {
        var barrier = (Barrier)b;
        barrier.SignalAndWait();
        Thread.Sleep(100);
        state = 1;
        v1 = true;
    }

    public void DoWork2(object b)
    {
        var barrier = (Barrier)b;
        barrier.SignalAndWait();
        Thread.Sleep(200);
        bool currentV2 = v2;
        Console.WriteLine("{0}", state);
    }

    public static void Main(string[] args)
    {
        var synchronizer = new SurrealVolatileSynchronizer();
        var thread1 = new Thread(synchronizer.DoWork1);
        var thread2 = new Thread(synchronizer.DoWork2);
        var barrier = new Barrier(3);
        thread1.Start(barrier);
        thread2.Start(barrier);
        barrier.SignalAndWait();
        thread1.Join();
        thread2.Join();
    }
}

This surreal example expects threads and Thread.Sleep(int) to take an exact amount of time. If this is so, it synchronizes correctly, because DoWork2 performs a volatile read (acquire) after DoWork1 performs a volatile write (release).

In Java, even with such surreal expectations fulfilled, this would not guarantee synchronization. In DoWork2, you'd have to read from the same volatile field you wrote to in DoWork1.

Discovert answered 11/10, 2017 at 18:45 Comment(0)
F
10

I read the specs, and they say nothing about whether or not a volatile write will EVER be observed by another thread (volatile read or not). Is that correct or not?

Let me rephrase the question:

Is it correct that the specification says nothing on this matter?

No. The specification is very clear on this matter.

Is a volatile write guaranteed to be observed on another thread?

Yes, if the other thread has a critical execution point. A special side effect is guaranteed to be observed to be ordered with respect to a critical execution point.

A volatile write is a special side effect, and a number of things are critical execution points, including starting and stopping threads. See the spec for a list of such.

Suppose for example thread Alpha sets volatile int field v to one and starts thread Bravo, which reads v, and then joins Bravo. (That is, blocks on Bravo completing.)

At this point we have a special side effect -- the write -- a critical execution point -- the thread start -- and a second special side effect -- a volatile read. Therefore Bravo is required to read one from v. (Assuming no other thread has written it in the meanwhile of course.)

Bravo now increments v to two and ends. That's a special side effect -- a write -- and a critical execution point -- the end of a thread.

When thread Alpha now resumes and does a volatile read of v it is required that it reads two. (Assuming no other thread has written to it in the meanwhile of course.)

The ordering of the side effect of Bravo's write and Bravo's termination must be preserved; plainly Alpha does not run again until after Bravo's termination, and so it is required to observe the write.

Foveola answered 29/6, 2017 at 18:5 Comment(7)
Thanks for answering, especially with an example. However, having read the section about the "Execution Order" in the specs, I'm even more confused. Honestly, I think the specs are unclear and not precise enough on this matter. What is it now that is preserved, and from whos perspective? Is it about the ordering of special side effects w.r.t. the special execution points? Or about the preservation of special effects themselves? What does "preserve" even mean in this context? I wish this would be more formal! ;)Maddux
I am still going to accept your answer, cause I think it gives me enough pointers for further searching!Maddux
Ok, now I am able to formulate a more mapped-out question: Suppose two threads A and B were already started and are running now. Both keep executing some loop, and are not stopped. A writes to some volatile field v, and B occasionally reads v, basing some logic on the read value. Now: What part of the specs guarantee that B ever reads the values written by A?Maddux
@MightyNicM: Well, what in your scenario what guarantees that thread B runs at all? The spec says nothing about thread scheduling algorithms. The scheduler could be starving B and spending all its time in A if it so chooses.Foveola
I explicitly wrote "suppose that". So assume every thread makes progress eventually. What then? Will B eventually read the value written by A?Maddux
@MightyNicM: Suppose we have volatile variables x and y, both zero. Thread B is in a tight loop waiting for x to be 1, after which it reads y. Thread A at some point sets y to 1 and at some later point, x to 1. The guarantee made by volatile is that if B breaks out of the loop, then y is observed to be 1. But there is no guarantee that B will observe the change made by A within 1 nanosecond, 1 microsecond, 1 millisecond, ... of A writing to x. In practice will B see it eventually? Sure. But the time elapsed is 100% an implementation detail.Foveola
@MightyNicM: Now, you started by asking about cache implementation details. For an interesting take on those, see Joe's article on volatile and freshness, particularly the final paragraph. joeduffyblog.com/2008/06/13/… (In Joe's example m_state is a field which is 1 if a resource is in use, and 0 otherwise; the interlocked operations are essentially spin locks that produce memory barriers.)Foveola
S
0

Yes, volatile is about fences and fences are about ordering. So when? is not in the scope and is actually an implementation detail of all the layers (compiler, JIT, CPU etc.) combined, but every implementation should have decent and practical answer to the question.

Steamship answered 22/6, 2017 at 10:14 Comment(7)
So this means that volatile cannot be used to implement a synchronization mechanism, because its specification doesn't give any guarantees regarding visibility?Maddux
It is a guarantee that .NET gives you when you use volatile. volatile itself defines only ordering.Steamship
So ordering guarantees are pretty useless when another thread is not guaranteed to ever see any of these operations' effects, right? ;)Maddux
Ordering guarantees guarantee the ordering in which another thread will see the changes :) That's one of two reason why we have volatiles. Another one is atomicity. Delivering the changes is the responsibility of other parts of the platform.Steamship
Ok, provided that the other thread sees them EVENTUALLY, which is not stated in the specs. Maybe it is assumed, I don't know. How do volatiles and atomicity relate? What exactly is atomic w.r.t. volatiles?Maddux
@MightyNicM: The relationship between atomicity and volatility in C# is that only variables which were already guaranteed to be read and written atomically are allowed to be declared volatile. You can declare a volatile int variable, because ints are guaranteed to be atomic read and write. But doubles are not, and there is no volatile double in C#.Foveola
Note that atomicity only applies to torn reads and writes. Atomic test and set is not guaranteed for volatile references. Atomic increment is not guaranteed for volatile ints. And so on.Foveola

© 2022 - 2024 — McMap. All rights reserved.