I have a piece of CUDA code in which threads are performing atomic operations on shared memory. I was thinking since the result of atomic operation will be visible to other threads of the block instantly anyways, it might be good to instruct the compiler to have the shared memory volatile
.
So I changed
__global__ void CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
__shared__ int smem_data[BLOCK_SIZE];
uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
atomicAdd( smem_data+threadIdx.x, 6);
}
}
to
__global__ void volShared_CoalescedAtomicOnSharedMem(int* data, uint nElem)
{
volatile __shared__ int smem_data[BLOCK_SIZE];
uint tid = (blockIdx.x * blockDim.x) + threadIdx.x;
for ( uint i = tid; i < nElem; i += blockDim.x*gridDim.x){
atomicAdd( smem_data+threadIdx.x, 6);
}
}
Below compile-time error happens having above change:
error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (volatile int *, int)
Why isn't a volatile
address supported as an argument for atomic operations? Is it because compiler already treats the shared memory as volatile as soon as it identifies that there's going to be atomic operations on it?
volatile
s? – Boatenvolatile
shared memory likeatomicAdd( (int*)(smem_data+threadIdx.x), 6);
. I observe no difference in performance compared to version not havingvolatile
keyword for shared memory. – Juanajuanitavolatile
? – JuanajuanitaatomicAdd
should havevolatile
overloads. It's simply an oversight. IIRC I submitted a bug report a while back to correct this. You can coerce it to work with aconst_cast
. – Hygro