In CUDA programming guide it is stated that atomic operations on mapped pinned host memory "are not atomic from the point of view of the host or other devices." What I get from this sentence is that if the host memory region is accessed only by one GPU, it is fine to do atomic on the mapped pinned host memory (even from within multiple simultaneous kernels).
On the other hand, in the book the CUDA Handbook by Nicholas Wilt at page 128 it is stated that:
Do not try to use atomics on mapped pinned host memory, either for the host (locked compare-exchange) or the device (
atomicAdd()
). On the CPU side, the facilities to enforce mutual exclusion for locked operations are not visible to peripherals on the PCI express bus. Conversely, on the GPU side, atomic operations only work on local device memory locations because they are implemented using the GPU's local memory controller.
Is is safe to do atomic from inside a CUDA kernel on mapped pinned host memory? Can we rely on PCI-e bus to keep the atomicity of atomics' read-modify-write?