The first implementation is correct while the second one is most likely incorrect. "Most likely" because theoretically there can be some synchronization code that is not shown here. But even then it is bad design since we rely on this external synchronization instead of encapsulating it. This is still highly unusual and confusing.
In general acquire/release always come in pairs. An acquire-fence by itself does not achieve anything - it requires a release-operation/release-fence to synchronize-with in order to establish a happens-before relation.
To understand why/where we need acquire/release semantic we need to understand our requirements - which operations do we need to order?
A shared pointer allows to have multiple references to a shared object, potentially by multiple different threads (otherwise it would not make sense to make the ref counting thread safe). All these threads may modify the object. Obviously these modifications must be synchronized somehow, but we don't care about the details how here. However, destruction can not be covered by the same synchronization method since it is handled by the shared pointer itself. So what we have to order is that all modifications (of all threads) happen-before destruction of the object. This can be achieved by using acquire/release operations on the ref count. The one thread that performs the last decrement which causes the ref count to drop to zero has to synchronize-with all previous decrements. So that last thread had to use acquire while all other threads have to use release. Since we do not know beforehand whether we will be the last thread the simplest solution is to use memory_order_acq_rel
for all. Alternatively one can do this to make this more explicit:
if (m_ref.fetch_sub(1, std::memory_order_release) == 1) {
m_ref.load(std::memory_order_acquire); // synchronize with all previous fetch_sub operations
delete this;
}
One often overlooked but nevertheless important detail is that this only works because we have a release sequence.
A release sequence headed by a release operation A on an atomic object M is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first operation is A, and every subsequent operation is an atomic read-modify-write operation.
Usually a an acquire-load synchronizes-with the release-store from which it takes its value, ie., this is a synchronization between two threads. In this case however we need to synchronize with all threads which at some point held a reference.
Consider the following modification order of our ref count:
T1: store(1)
T2: fetch_add(1, relaxed)
T1: fetch_sub(1, release) <- start of release sequence 1
T3: fetch_add(1, relaxed)
T2: fetch_sub(1, release) <- start of release sequence 2
T3: fetch_sub(1, release) <- start of release sequence 3
Thread T3 performs the last decrement that causes the ref count to drop to zero.
Since all modifications except for the initial store are done via read-modify-write operations, every release-operation is the head of a release sequence that extends until the end of the modification order. In this example we have three active release sequences. When thread T3 then performs the acquire-load this will synchronize-with the heads of all active release-sequences, ie., with all threads that previously performed a release-fetch_sub
.
I am not a fan of reasoning about reorderings or other potential compiler optimizations. The standard does not say anything about such optimizations, but instead defines an abstract machine with a defined (observable) behavior. With respect to concurrent execution, the formal concept that defines this behavior is the happens-before relation. Virtually all optimizations the compiler may perform are based on what is commonly referred to as the "as-if rule", which basically says that the compiler is free to perform all kinds optimizations or code transformations, as long as it does not change the observable behavior.
In other words, if you correctly identify your required happens-before relations and ensure that they are established correctly, then you don't need to care about compiler optimizations at all.
If thinking about these optimizations helps with your mental model, that is fine. Just be aware that ultimately what counts is that the necessary happens-before relations are in place, because this is also the model that the compiler will use to analyze the code and decide which optimizations it may perform.
In the comments it is mentioned that the second implementation is used in opensssl, but there we have the following comment:
/*
* Changes to shared structure other than reference counter have to be
* serialized. And any kind of serialization implies a release fence. This
* means that by the time reference counter is decremented all other
* changes are visible on all processors. Hence decrement itself can be
* relaxed. In case it hits zero, object will be destructed. Since it's
* last use of the object, destructor programmer might reason that access
* to mutable members doesn't have to be serialized anymore, which would
* otherwise imply an acquire fence. Hence conditional acquire fence...
*/
A shared_ptr
allows multiple threads to keep a reference to a shared resource, but modifications of that resource must still be synchronized. My understanding of this comment is that they rely on the release-semantic of this synchronization, so that the acquire-fence can synchronize with it. I find this rather suspect TBH. I haven't looked at the rest of the code, so it might actually be correct, but it is at least highly unusual!
shared_ptr
reference is increased on copy. As a result,shared_ptr
logic cannot be used unless I use tricks like keeping a shared pointer to self in the class which I hate. So I need to add small ref counting mechanism. – Headmanweak_ptr
before, but I could not find a way to resolved my problem with it here. The only approach that I found was to keep a copy of shared_ptr inside my class to prevent it from deleting, and when I need to delete, set this copy to nullptr. But this is not an approach that I like. That's why I decided to add a ref-counting to my class. – Headmanrelease
method is doing a decrement withmemory_order_release
, and then there is astd::atomic_thread_fence(std::memory_order_acquire);
if retain count reaches 0. I am unsure it will work withmemory_order_relaxed
inrelease
, at least my conclusion when I wrote that part was no. – WeisCRYPTO_REF_UP
andCRYPTO_REF_DOWN
here: github.com/openssl/openssl/blob/master/include/internal/… – Headmanm_ref.load(acquire)
to create an artificial atomic operation. – Outgom_ref.load(acquire)
, isn't it? Because if result is not used, it maybe a candidate for optimization.... – Headmanshared_ptr
implementations (I don't remember precisely which ones now). And I saw thatmemory_order_release
followed bymemory_order_acquire
was common pattern. I wish I have the definitive answer to that question. – Weisoperator*
andoperator->
? – Swobif (refcnt == 0) blah; if (refcnt > 0) eturn; bloh
is really ugly. I am not saying stick to SESE (single entry single exit) but I had to read it back multiple times! – Swobif (refcnt == 0) fence
could just be an unconditional fence after the early-out if()return, or it could be an if/else. It's just weird that bothfence
anddelete this
run in the same case (assumingrefcnt
can't ever be negative, in which case you'd delete without having done a fence), but are on opposite sides of something else, and are controlled different ways. – Pergrim