As known, std::atomic
and volatile
are different things.
There are 2 main differences:
Two optimizations can be for
std::atomic<int> a;
, but can't be forvolatile int a;
:- fused operations:
a = 1; a = 2;
can be replaced by the compiler ona = 2;
- constant propagation:
a = 1; local = a;
can be replaced by the compiler ona = 1; local = 1;
- fused operations:
Reordering of ordinary reads/writes across atomic/volatile operations:
- for
volatile int a;
any volatile-read/write-operations can't be reordered. But nearby ordinary reads/writes can still be reordered around volatile reads/writes. - for
std::atomic a;
reordering of nearby ordinary reads/writes restricted based on the memory barrier used for atomic operationa.load(std::memory_order_...);
- for
I.e. volatile
don't introduce a memory fences, but std::atomic
can do it.
As is well described in the article:
- Herb Sutter, January 08, 2009 - part 1: http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484
- Herb Sutter, January 08, 2009 - part 2: http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2
For example, std::atomic
should be used for concurrent multi-thread programs (CPU-Core <-> CPU-Core), but volatile
should be used for access to Mamory Mapped Regions on devices (CPU-Core <-> Device).
But if required, both have unusual semantics and has any or all of the atomicity and/or ordering guarantees needed for lock-free coding, i.e. if required volatile std::atomic<>
, require for several reasons:
- ordering: to prevent reordering of ordinary reads/writes, for example, for reads from CPU-RAM, to which the data been written using the Device DMA-controller
For example:
char cpu_ram_data_written_by_device[1024];
device_dma_will_write_here( cpu_ram_data_written_by_device );
// physically mapped to device register
volatile bool *device_ready = get_pointer_device_ready_flag();
//... somewhere much later
while(!device_ready); // spin-lock (here should be memory fence!!!)
for(auto &i : cpu_ram_data_written_by_device) std::cout << i;
- spilling: CPU write to CPU-RAM and then Device DMA-controller read from this memory: https://en.wikipedia.org/wiki/Register_allocation#Spilling
example:
char cpu_ram_data_will_read_by_device[1024];
device_dma_will_read_it( cpu_ram_data_written_by_device );
// physically mapped to device register
volatile bool *data_ready = get_pointer_data_ready_flag();
//... somewhere much later
for(auto &i : cpu_ram_data_will_read_by_device) i = 10;
data_ready=true; //spilling cpu_ram_data_will_read_by_device to RAM, should be memory fence
- atomic: to guarantee that the volatile operation will be atomic - i.e. It will consist of a single operation instead of multiple - i.e. one 8-byte-operation instead of two 4-byte-operations
For this, Herb Sutter said about volatile atomic<T>
, January 08, 2009: http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2
Finally, to express a variable that both has unusual semantics and has any or all of the atomicity and/or ordering guarantees needed for lock-free coding, only the ISO C++0x draft Standard provides a direct way to spell it: volatile atomic.
But do modern standards C++11 (not C++0x draft), C++14, and C++17 guarantee that volatile atomic<T>
has both semantics (volatile + atomic)?
Does volatile atomic<T>
guarantee the most stringent guarantees from both volatile and atomic?
- As in
volatile
: Avoids fused-operations and constant-propagation as described in the beginning of the question - As in
std::atomic
: Introduces memory fences to provide ordering, spilling, and being atomic.
And can we do reinterpret_cast
from volatile int *ptr;
to volatile std::atomic<int>*
?
volatile atomic<T>
overatomic<volatile T>
and why would you want to do thereinterpret_cast
? It will probably work, but not guaranteed. – Ceroplasticsstd::atomic<volatile T>
because a volatile type is not trivially copyable. – Motherhoodstd::atomic<volatile T>
. – Allanadalevolatile int *ptr;
and I want to use codewhile(ptr->load(std::memory_order_acquire) == 0);
instead ofwhile(*ptr == 0); std::atomic_thread_fence(std::memory_order_acquire);
– Allanadaleatomic
doesn't guarantee that. It could very well take a lock and then do two 4-byte writes.ATOMIC_LONG_LOCK_FREE
could be0
to say "never lock-free". – DespoilATOMIC_LONG_LOCK_FREE == 2
i.e. always lock-free for x86_64(gcc,clang,icc), ARM64, PowerPC, MIPS, MIPS64. ButATOMIC_LONG_LOCK_FREE == 1
for ARM, MSP430, i.e. sometimes lock-free, e.g. lock-free if only aligned memory accesses are naturally atomic on a given architecture. The value of this constant for different architectures and compilers can be seen on the assembly code: godbolt.org/g/fScV2J – Allanadalestd::atomic
for concurrency, but you also needvolatile
so that it remains accessible to the debugger if that global isn't actually used elsewhere. – Bohemian