Why using "volatile" keyword for shared memory is not possible when atomic operations are done on shared memory?

Asked 13/4, 2014 at 16:55 Answered 13/4, 2014 at 20:19

Solved cuda atomic volatile gpu-shared-memory

I have a piece of CUDA code in which threads are performing atomic operations on shared memory. I was thinking since the result of atomic operation will be visible to other threads of the block instantly anyways, it might be good to instruct the compiler to have the shared memory volatile.
So I changed

__global__ void CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
    __shared__ int smem_data[BLOCK_SIZE];
    uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
    for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
        atomicAdd( smem_data+threadIdx.x, 6);
    }
}

__global__ void volShared_CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
    volatile __shared__ int smem_data[BLOCK_SIZE];
    uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
    for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
        atomicAdd( smem_data+threadIdx.x, 6);
    }
}

Below compile-time error happens having above change:

error: no instance of overloaded function "atomicAdd" matches the argument list
        argument types are: (volatile int *, int)

Why isn't a volatile address supported as an argument for atomic operations? Is it because compiler already treats the shared memory as volatile as soon as it identifies that there's going to be atomic operations on it?

Juanajuanita answered 13/4, 2014 at 16:55 Comment(5)

Have you considered overloading this function or defining a wrapper working on volatiles? – Boaten 13/4, 2014 at 19:0

@JackOLantern Good point. I can use atomic with volatile shared memory like atomicAdd( (int*)(smem_data+threadIdx.x), 6);. I observe no difference in performance compared to version not having volatile keyword for shared memory. – Juanajuanita 13/4, 2014 at 19:30

This question still remains: does NVCC treat shared memory used for atomics as volatile? – Juanajuanita 13/4, 2014 at 19:30

@RobertCrovella has provided you with the correct answer. There is no need indeed to define a wrapper or an overloaded function (as in my comment above), since a simple cast would be enough. – Boaten 13/4, 2014 at 20:37

@Juanajuanita I believe you're correct that atomicAdd should have volatile overloads. It's simply an oversight. IIRC I submitted a bug report a while back to correct this. You can coerce it to work with a const_cast. – Hygro 14/4, 2014 at 4:44

The definition of the volatile qualifier is given in the programming guide. It instructs the compiler to always generate a read or write for that access, and never "optimize" it into a register or some other optimization.

Since atomic operations are guaranteed to act on actual memory locations (either shared or global) the combination of the two is unnecessary. Therefore, versions of atomic functions prototyped for volatile qualifier are not provided.

If you have a memory location that is already declared as volatile, simply cast it to the corresponding non-volatile type when you pass the address to your atomic function. The behavior will be as expected.(example)

Therefore, atomic operations can operate on locations specified as volatile with this proviso.

The simple fact that you have accessed a particular location using atomics somewhere in your code does not mean that the compiler will treat every access elsewhere as implicitly volatile. If you need volatile behavior elsewhere, declare it explicitly.

Bowerman answered 13/4, 2014 at 20:19 Comment(0)

-1

The previous poster has correctly identified the problem: There is no atomicAdd function defined that takes a volatile parameter.

Your question as to why this is the case, my guess is that your library developers simply omitted that interface. Imagine all the combinations of volatile, const, and possible parameters and the number of potential interfaces starts to explode.

Why isn't a volatile address supported as an argument for atomic operations?

Atomic operations are not part of C/C++. In your case, they are being implemented in a library that is probably implemented in assembly language.

Is it because compiler already treats the shared memory as volatile as soon as it identifies there's going to be atomic operations on it?

No, this is they way the library writer has defined the function interface.

Seeing answered 13/4, 2014 at 17:46 Comment(1)

C11 does include the _Atomic keyword (and some functions). C++11 includes the std::atomic<T> class library in #include <atomic>. C++11 functions like atomic_load( const volatile std::atomic<T>* ) actually come in both volatile and non-volatile flavours. (en.cppreference.com/w/cpp/atomic/atomic_load). So volatile std::atomic<int> foo; Just Works, the same as std::atomic<int> bar;. You don't need both const and non-const overloads; implicit conversion is allowed there, unlike with volatile. So CUDA "only" would have needed twice as many functions to allow volatile. – Aynat 11/5, 2023 at 18:48

Recommended topics

Hot tags