Atomic access to shared memory

Asked 6/1, 2012 at 14:35 Answered 27/6, 2019 at 8:45

c++linux shared-memory memory-barriers stdatomic

I have a shared memory between multiple processes that interpets the memory in a certain way. Ex:

DataBlock {
int counter;
double value1;
double ...    }

What I want is for the counter to be updated/incremented atomically. And for a memory release to happen on that address. If I werent using shared memory, for example, it would be something like

std::atomic<int> counter;
atomic_store(counter, newvalue, std::memory_order_release); // perform release     operation on the affected memory location making the write visible to other threads

How do I achieve this for a random memory location (interpreted to be DataBlock counter above)? I can guarantee the address is aligned as required by the architecture (x86 Linux).

Make the update atomic - how? (i.e. atomicupdate(addr, newvalue))
Memory syncing for multicore - (i.e. memorysync(addr)) - only way I can see is using the std::atomic_thread_fence(std::memory_order_release) - but this will "establish memory synchronization ordering of ALL atomic and relaxed atomic stores" - thats overkill for me - I just want the counter location to be synchronized. Appreciate any thoughts.

Crowell answered 6/1, 2012 at 14:35 Comment(6)

I'm just speculating, but I'm under the impression that the C++ programming model has no notion of "processes" and the memory model has no notion of "shared memory", so I doubt that the standard itself will make any guarantees. Shared memory is very much a platform-dependent feature, so consult your platform's documentation. – Individuation 6/1, 2012 at 14:59

can you put an atomic<int> in your DataBlock? That should work as long as atomic<int> is lockfree (the standard explicit mentions memory shared between processes as a use case for those). And no you can't just get an atomic for a random address (see #8749538) @Kerrek SB: actually that scenario is mentioned in [atomics.lockfree] in the final draft. – Brinton 6/1, 2012 at 15:22

@Grizzly: You mean non-normative note 29.4/3? Very interesting, I didn't know that. – Individuation 6/1, 2012 at 15:26

why would the cache coherence be any different if the memory is shared? what I need is a way to get the memory synced across cores for a particular address. If C++ dosent support it, does anyone know what asm instructions I can use? I read that on x86, the update would be atomic anyway so I guess that's resolved. – Crowell 6/1, 2012 at 15:45

@Crowell : That both OS and architecture-specific. – Botfly 6/1, 2012 at 16:9

std::atomic works within one process between threads, not between processes. One process doesn't care about std::atomic usage in another process. – Spiffing 22/4, 2016 at 12:42

I can't answer with authority here, but I can give related information that might help.

Mutexes can be created in shared memory and/or created to be cross-process. Pthread has a special creation flag, I can't remember if that uses shared memory, or you then share a handle. The linux "futex" can use shared memory directly (note the user address may differ, but the underlying real address should be the same)
Hardware atomics work on memory and not process variables. That is, your chip won't care which programs are modifying the variables, the lowest level atomics will thus naturally be cross-process. The same applies to fences.
C++11 fails to specify cross-process atomics. However, if they are lock-free (check the flag) it is hard to see how a compiler could implement them such that cross-process wouldn't work. But you'd be placing a lot of faith in your tool-chain and final platform.
CPU dependency guarantees also track real memory addresses, so as long as your program would be correct in a threaded form it should also be correct in its multi-process form (with respect to visibility).
Kerrek is correct, the abstract machine doesn't really mention multiple processes. However, its synchronization details are written in a way such that they'd equally apply to inter-process as they do to multi-thread. This relates to #3: it'd be hard for a compiler to screw this up.

Short answer, there is no standards compliant way to do this. However, leaning on the way the standard defines mutli-threads there are a lot of assumptions you can make for a quality compiler.

The biggest question is whether an atomic can simply be allocated in shared memory (placement new) and work. Obviously this would only work if it is a true hardware atomic. My guess however is that with a quality compiler/libary the C++ atomics should work find in shared memory.

Have fun verifying behaviour. :)

Facture answered 6/1, 2012 at 17:30 Comment(2)

ISO C++ does specify that lock-free atomics should be address-free. (See another answer on this question for the quote). This is the case for normal CPUs where atomicity is based on physical address, not virtual. – Proclivity 18/12, 2019 at 17:42

std::atomic will definitely not work if it's not lock-free. Current implementations use a hash table of locks, so different processes would use different hash tables. Where is the lock for a std::atomic?. So use static_assert(std::atomic<T>::is_always_lock_free, "atomic<T> not lock free, can't work in shared mem") if you have C++17 to check this cheaply at compile time. – Proclivity 18/12, 2019 at 17:44

Since you're on Linux, you can use the gcc atomic built-in __sync_fetch_and_add() on the address for counter ... according to the gcc-documentation on atomic built-ins, this will also implement a full memory fence, not a release operation, but since you actually want a read-modify-write operation rather than simply a load (i.e., incrementing a counter is not just a load, but you have to read, then modify, and finally write-back the value), the full-memory fence is going to be a better choice to enforce the correct memory ordering for this operation.

Somato answered 6/1, 2012 at 19:1 Comment(2)

__sync_fetch_and_add() can be used, but I think __sync_sub_and_fetch() is more suitable. You decrement the reference count, and release if it went down to zero. It's guaranteed that if two threads decrement simultanously from 2, only one will have 0 returned (and will release). – Omeromero 6/1, 2012 at 20:30

That's definitely a good suggestion as well ... one atomic built-in would be used for incrementing during copies, etc., and another for decreasing the value when the reference is no longer being used. – Somato 6/1, 2012 at 21:15

i am looking at the standard draft N4820 [atomics.lockfree], and it says:

4 [Note: Operations that are lock-free should also be address-free. That is, atomic operations on the same memory location via two different addresses will communicate atomically. The implementation should not depend on any per-process state. This restriction enables communication by memory that is mapped into a process more than once and by memory that is shared between two processes. — end note]

so if you are targeting address-free, the prerequisite is lock-free and this can be checked by std::atomic.

however, i am not sure how atomic object shall be created. is it good enough to place the object in shared memory? i have not find out any specification on this usage while i do see such code usage on github.

(editor's note: you cast a pointer to shared memory, e.g.
auto p = static_cast<atomic<int>*>(ptr); Same as accessing shared memory as any other type.)

Songer answered 27/6, 2019 at 8:45 Comment(0)

Recommended topics

Hot tags