Why not volatile on System.Double and System.Long?

P

5

48

A question like mine has been asked, but mine is a bit different. The question is, "Why is the volatile keyword not allowed in C# on types System.Double and System.Int64, etc.?"

On first blush, I answered my colleague, "Well, on a 32-bit machine, those types take at least two ticks to even enter the processor, and the .Net framework has the intention of abstracting away processor-specific details like that." To which he responds, "It's not abstracting anything if it's preventing you from using a feature because of a processor-specific problem!"

He's implying that a processor-specific detail should not show up to a person using a framework that "abstracts" details like that away from the programmer. So, the framework (or C#) should abstract away those and do what it needs to do to offer the same guarantees for System.Double, etc. (whether that's a Semaphore, memory barrier, or whatever). I argued that the framework shouldn't add the overhead of a Semaphore on volatile, because the programmer isn't expecting such overhead with such a keyword, because a Semaphore isn't necessary for the 32-bit types. The greater overhead for the 64-bit types might come as a surprise, so, better for the .Net framework to just not allow it, and make you do your own Semaphore on larger types if the overhead is acceptable.

That led to our investigating what the volatile keyword is all about. (see this page). That page states, in the notes:

In C#, using the volatile modifier on a field guarantees that all access to that field uses VolatileRead or VolatileWrite.

Hmmm.....VolatileRead and VolatileWrite both support our 64-bit types!! My question, then, is,

"Why is the volatile keyword not allowed in C# on types System.Double and System.Int64, etc.?"

Pericardium answered 18/1, 2011 at 17:27 Comment(2)

Please note that Microsoft has corrected the page and no longer says

In C#, using the volatile modifier on a field guarantees that all access to that field uses VolatileRead or VolatileWrite.

– Kellsie 7/7, 2011 at 11:25

It is the other way around. By imposing this restriction, they can guarantee volatile behavior on any processor. – Penultimate 31/8, 2023 at 13:24

O

6

Not really an answer to your question, but...

I'm pretty sure that the MSDN documentation you've referenced is incorrect when it states that "using the volatile modifier on a field guarantees that all access to that field uses VolatileRead or VolatileWrite".

Directly reading or writing to a volatile field only generates a half-fence (an acquire-fence when reading and a release-fence when writing).

The VolatileRead and VolatileWrite methods use MemoryBarrier internally, which generates a full-fence.

Joe Duffy knows a thing or two about concurrent programming; this is what he has to say about volatile:

(As an aside, many people wonder about the difference between loads and stores of variables marked as volatile and calls to Thread.VolatileRead and Thread.VolatileWrite. The difference is that the former APIs are implemented stronger than the jitted code: they achieve acquire/release semantics by emitting full fences on the right side. The APIs are more expensive to call too, but at least allow you to decide on a callsite-by-callsite basis which individual loads and stores need the MM guarantees.)

Observer answered 18/1, 2011 at 17:56 Comment(11)

i heard another opinion: igoro.com/archive/volatile-keyword-in-c-memory-model-explained scroll down to table at section "Memory model and .NET operations" – Eminence 18/1, 2011 at 18:5

strange, "they actually generate full fences", albahari.com/threading/part4.aspx – Eminence 18/1, 2011 at 18:7

@Andrey: Both VolatileRead and VolatileWrite call MemoryBarrier internally. So I don't see how they can possibly have weaker semantics than MemoryBarrier alone, which is what that table suggests. – Observer 18/1, 2011 at 18:12

this might be mistake, i agree. – Eminence 18/1, 2011 at 18:13

My original post does certainly depend on the correctness of MSDN. I'll mark your answer as correct unless we get some other explanation. – Pericardium 19/1, 2011 at 19:16

@LukeH: Why would you assume that

Directly reading or writing to a volatile field only generates a half-fence (an acquire-fence when reading and a release-fence when writing).

AFAIK, volatile does NOT generate even half-fence on x86/x64. It only generates them on IA64 where it is required (does not have strong memory model). What it does in x86/x64 is it doesn't move reads/writes before or after the volatile read/write hence allowing the x86/x64 memory model to guarantee in order retirement of writes. – Kellsie 7/7, 2011 at 11:30

@Zach: But isn't that just because the x86 memory-model itself guarantees that reads have acquire semantics and writes have release semantics regardless? Perhaps I should have rephrased it as "Directly reading or writing to a volatile field only generates a half-fence (an acquire-fence when reading and a release-fence when writing) and only if doing so is required to enforce the CLR memory-model on the current hardware." Does that make any sense, or am I missing the point? – Observer 7/7, 2011 at 11:54

Yes it only generates them when it's required - otherwise you're implying that regardless of the need, a half fence is always required on volatile reads/writes (which could be done on x86/x64 too but it's not). – Kellsie 7/7, 2011 at 11:55

Actually, x86/x64 guarantees that stores are in order, so it's not exactly implicit acquire/release semantics. – Kellsie 7/7, 2011 at 11:57

@Zach: Yep, I think the linked Joe Duffy article explicitly calls out that the volatile aspect of a write is just a nop as far as the x86 jitter is concerned. (I can't confirm because his site is down at the moment.) – Observer 7/7, 2011 at 12:1

Don't know who Joe Duffy is but I worked on Northwood / Prescott CPUs when I was in Intel. I came across a Microsoft blog about guesstimating the in-order writes of x86 CPUs. I can say with 100% certainty that they're part of the design and is extensively tested. It would be a CPU recall otherwise (for x86/x64). Don't know and don't care about IA64 (don't think Intel cares much about that either). – Kellsie 7/7, 2011 at 12:5

B

18

He's implying that a processor-specific detail should not show up to a person using a framework that "abstracts" details like that away from the programmer.

If you are using low-lock techniques like volatile fields, explicit memory barriers, and the like, then you are entirely in the world of processor-specific details. You need to understand at a deep level precisely what the processor is and is not allowed to do as far as reordering, consistency, and so on, in order to write correct, portable, robust programs that use low-lock techniques.

The point of this feature is to say "I am abandoning the convenient abstractions guaranteed by single-threaded programming and embracing the performance gains possible by having a deep implementation-specific knowledge of my processor." You should expect less abstractions at your disposal when you start using low-lock techniques, not more abstractions.

You're going "down to the metal" for a reason, presumably; the price you pay is having to deal with the quirks of said metal.

Beelzebub answered 18/1, 2011 at 18:24 Comment(7)

could you please explain why volatile is not allowed for long and double? theoretically it is possible to do some tricks to make them volatile. – Eminence 18/1, 2011 at 18:29

@Andrey: Sure, the CLR permits us to make volatile, non-atomic access to fields of type double (I think... haven't tried it). The C# designers decided to not allow that; all volatile accesses are also atomic in C#. If you are the kind of person who needs volatile but non-atomic access to variables containing large structures then C# might not be the best language for you. If you want to use C# for that, you can always use VolatileRead and VolatileWrite; note that doing so in practice causes a full memory barrier, though that is not a guarantee of the APIs and it could change in the future. – Beelzebub 18/1, 2011 at 18:49

Thanks for your comment. You mentioned, "...having a deep implementation-specific knowledge of my processor." right after saying, "to write ... portable ... programs ...". So, are we interested in "my" processor, or portability (roughly all processors, or defensive programming)? I certainly agree that the more "to the metal" one gets (multi-threading being the topic), the more "implementation". I would think, however, that a keyword such as volatile could offer an implementation-agnostic solution, something like, "this keyword wraps VolatileRead and VolatileWrite". What do you think? – Pericardium 18/1, 2011 at 20:19

i agree that it is confusing to have volatile non-atomic read/write. but in java long is volatile and atomic. on 32 bit systems it is achieved by using XMM registers, that allow atomic loading from memory and are 128. i think CLR should implement something like that. – Eminence 18/1, 2011 at 20:22

@Limited Atonement well i agree with Eric. volatile is pretty low level stuff. use Interlocked and locks. Low level stuff can't be portable. about VolatileRead read LukeH answer. They are pretty expensive and they have different effect then volatile key word (they flush all caches). – Eminence 18/1, 2011 at 20:25

As mentioned by Andrey, this is suppported in Java "Writes and reads of volatile long and double values are always atomic. Writes to and reads of references are always atomic, regardless of whether they are implemented as 32 or 64 bit values." docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.7 Nothing magic about it. – Amalburga 21/12, 2022 at 14:6

12 yrs and 15 .net versions later - still cant get long to be volatile.. you can now buy a 32 bit intel only in antiques shop and yet.. – Sikora 2/12, 2023 at 15:2

E

14

Yes. Reason is that you even can't read double or long in one operation. I agree that it is poor abstraction. I have a feeling that reason was that reading them atomically requires effort and it would be too smart for compiler. So they let you choose the best solution: locking, Interlocked, etc.

Interesting thing is that they can actually be read atomically on 32 bit using MMX registers. This is what java JIT compiler does. And they can be read atomically on 64 bit machine. So I think it is serious flaw in design.

Eminence answered 18/1, 2011 at 17:35 Comment(5)

The point would be if they can be atomically read/written on any implementation of the CLI. – Newhouse 18/1, 2011 at 17:42

"I have a feeling that reason was that reading them atomically requires effort and it would be too smart for compiler." But it's not smarter than for the other types. Either way, just use System.Threading.Thread.VolatileRead for System.Double and System.Int32. It doesn't take more "smarts" or anything else to just do what you're doing to the other varibales. – Pericardium 18/1, 2011 at 17:45

@Henk All implementations of CLI have System.Threading.Thread.VolatileRead right? Or maybe I don't get your meaning. – Pericardium 18/1, 2011 at 17:47

@Limited Atonement calling VolatileRead on simple-looking statement is what i called too smart. You can argue on that but it is decision of compiler team. I wish Eric Lipper could shed some light. – Eminence 18/1, 2011 at 17:59

@Henk Holterman it is actually rather weak point. Java allows volatile on long and double and it has much more implementations then CLI. – Eminence 18/1, 2011 at 18:0

O

6

Not really an answer to your question, but...

I'm pretty sure that the MSDN documentation you've referenced is incorrect when it states that "using the volatile modifier on a field guarantees that all access to that field uses VolatileRead or VolatileWrite".

Directly reading or writing to a volatile field only generates a half-fence (an acquire-fence when reading and a release-fence when writing).

The VolatileRead and VolatileWrite methods use MemoryBarrier internally, which generates a full-fence.

Joe Duffy knows a thing or two about concurrent programming; this is what he has to say about volatile:

(As an aside, many people wonder about the difference between loads and stores of variables marked as volatile and calls to Thread.VolatileRead and Thread.VolatileWrite. The difference is that the former APIs are implemented stronger than the jitted code: they achieve acquire/release semantics by emitting full fences on the right side. The APIs are more expensive to call too, but at least allow you to decide on a callsite-by-callsite basis which individual loads and stores need the MM guarantees.)

Observer answered 18/1, 2011 at 17:56 Comment(11)

i heard another opinion: igoro.com/archive/volatile-keyword-in-c-memory-model-explained scroll down to table at section "Memory model and .NET operations" – Eminence 18/1, 2011 at 18:5

strange, "they actually generate full fences", albahari.com/threading/part4.aspx – Eminence 18/1, 2011 at 18:7

@Andrey: Both VolatileRead and VolatileWrite call MemoryBarrier internally. So I don't see how they can possibly have weaker semantics than MemoryBarrier alone, which is what that table suggests. – Observer 18/1, 2011 at 18:12

this might be mistake, i agree. – Eminence 18/1, 2011 at 18:13

My original post does certainly depend on the correctness of MSDN. I'll mark your answer as correct unless we get some other explanation. – Pericardium 19/1, 2011 at 19:16

@LukeH: Why would you assume that

Directly reading or writing to a volatile field only generates a half-fence (an acquire-fence when reading and a release-fence when writing).

AFAIK, volatile does NOT generate even half-fence on x86/x64. It only generates them on IA64 where it is required (does not have strong memory model). What it does in x86/x64 is it doesn't move reads/writes before or after the volatile read/write hence allowing the x86/x64 memory model to guarantee in order retirement of writes. – Kellsie 7/7, 2011 at 11:30

@Zach: But isn't that just because the x86 memory-model itself guarantees that reads have acquire semantics and writes have release semantics regardless? Perhaps I should have rephrased it as "Directly reading or writing to a volatile field only generates a half-fence (an acquire-fence when reading and a release-fence when writing) and only if doing so is required to enforce the CLR memory-model on the current hardware." Does that make any sense, or am I missing the point? – Observer 7/7, 2011 at 11:54

Yes it only generates them when it's required - otherwise you're implying that regardless of the need, a half fence is always required on volatile reads/writes (which could be done on x86/x64 too but it's not). – Kellsie 7/7, 2011 at 11:55

Actually, x86/x64 guarantees that stores are in order, so it's not exactly implicit acquire/release semantics. – Kellsie 7/7, 2011 at 11:57

@Zach: Yep, I think the linked Joe Duffy article explicitly calls out that the volatile aspect of a write is just a nop as far as the x86 jitter is concerned. (I can't confirm because his site is down at the moment.) – Observer 7/7, 2011 at 12:1

Don't know who Joe Duffy is but I worked on Northwood / Prescott CPUs when I was in Intel. I came across a Microsoft blog about guesstimating the in-order writes of x86 CPUs. I can say with 100% certainty that they're part of the design and is extensively tested. It would be a CPU recall otherwise (for x86/x64). Don't know and don't care about IA64 (don't think Intel cares much about that either). – Kellsie 7/7, 2011 at 12:5

K

4

It's a simple explanation of legacy. If you read this article - http://msdn.microsoft.com/en-au/magazine/cc163715.aspx, you'll find that the only implementation of the .NET Framework 1.x runtime was on x86 machines, so it makes sense for Microsoft to implement it against the x86 memory model. x64 and IA64 were added later. So the base memory model was always one of x86.

Could it have been implemented for x86? I'm actually not sure it can be fully implemented - a ref of a double returned from native code could be aligned to 4 bytes instead of 8. In which case, all your guarantees of atomic reads/writes no longer hold true.

Kellsie answered 7/7, 2011 at 11:14 Comment(0)

E

4

Starting from .NET Framework 4.5, it is now possible to perform a volatile read or write on long or double variables by using the Volatile.Read and Volatile.Write methods. These methods perform atomic reads and writes on the long/double variables, as it's evident from their implementation:

private struct VolatileIntPtr { public volatile IntPtr Value; }

[Intrinsic]
[NonVersionable]
public static long Read(ref long location) =>
#if TARGET_64BIT
    (long)Unsafe.As<long, VolatileIntPtr>(ref location).Value;
#else
    // On 32-bit machines, we use Interlocked,
    // since an ordinary volatile read would not be atomic.
    Interlocked.CompareExchange(ref location, 0, 0);
#endif

Their atomicity is documented too:

The Volatile class also provides read and write operations for some 64-bit types such as Int64 and Double. Volatile reads and writes on such 64-bit memory are atomic even on 32-bit processors, unlike regular reads and writes.

Using these two methods is not as convenient as the volatile keyword though. Attention is required to not forget wrapping every read/write access of the volatile field in Volatile.Read or Volatile.Write respectively.

Epiblast answered 3/4, 2022 at 10:0 Comment(0)

Recommended topics

Hot tags