Atomic load in C with MSVC

TL;DR: I need the Microsoft C (not C++) equivalent of C11's atomic_load. Anyone know what the right function is?

I have some pretty standard code which uses atomics. Something like

do {
  bar = atomic_load(&foo);
  baz = some_stuff(bar);
} while (!atomic_compare_exchange_weak(&foo, &bar, baz));

I'm trying to figure out how to handle it with MSVC. The CAS is easy enough (InterlockedCompareExchange), but atomic_load is proving more troublesome.

Maybe I'm missing something, but the Synchronization Functions list on MSDN doesn't seem to have anything for a simple load. The only thing I can think of would be something like InterlockedOr(object, 0), which would generate a store for every load (not to mention a fence)…

As long as the variable is volatile I think it would be safe to just read the value, but if I do that Visual Studio's code analysis feature emits a bunch of C28112 warnings ("A variable (foo) which is accessed via an Interlocked function must always be accessed via an Interlocked function.").

If a simple read is really the right way to go I think I could silence those with something like

#define atomic_load(object) \
  __pragma(warning(push)) \
  __pragma(warning(disable:28112)) \
  (*(object)) \
  __pragma(warning(pop))

But the analyzer's insistence that I should always be using the Interlocked* functions leads me to believe there must be a better way. If that's the case, what is it?

I think ignoring the analyzer is acceptable here, given the documentation says simple reads of register width variables are safe (32 bit on 32 bit systems, 64 bit on 64 bit systems). The warning documentation itself basically says it's being overly cautious, even when the access might be safe.

That said, if you want to shut it up, you can always use an idempotent Interlocked operation to get the desired behavior. For example, you could just define:

#define atomic_load(object) InterlockedOr((object), 0)

Since bitwise or with 0 is never going to change the value, and it always returned the original value, the end result is to read the original value while atomically writing nothing.

If you were simulating atomic_load_explicit with memory_order_relaxed you might get better performance by using InterlockedOrNoFence to avoid memory barriers, but for simulating the default (sequentially consistent) atomic_load you'd want to stick with InterlockedOr.

InterlockedOr was chosen mostly arbitrarily (on the theory that it might be slightly faster in hardware than an operation with carry like addition or subtraction), but InterlockedXor with 0 would should behave the same way, as would several other operations, as long as they were done with their identity value.

You could also use InterlockedCompareExchange in a similar manner; testing would be needed to determine which was faster:

#define atomic_load(object) InterlockedCompareExchange((object), 0, 0)

where again, if the value is already 0, you set it to zero, but all you're really using it for is to get the return value, the original value before the no-op exchange.

Recommended topics

Hot tags