How can memory_order_relaxed work for incrementing atomic reference counts in smart pointers?
Asked Answered
R

3

24

Consider the following code snippet taken from Herb Sutter's talk on atomics:

The smart_ptr class contains a pimpl object called control_block_ptr containing the reference count refs.

// Thread A:
// smart_ptr copy ctor
smart_ptr(const smart_ptr& other) {
  ...
  control_block_ptr = other->control_block_ptr;
  control_block_ptr->refs.fetch_add(1, memory_order_relaxed);
  ...
}

// Thread D:
// smart_ptr destructor
~smart_ptr() {
  if (control_block_ptr->refs.fetch_sub(1, memory_order_acq_rel) == 1) {
    delete control_block_ptr;
  }
}

Herb Sutter says the increment of refs in Thread A can use memory_order_relaxed because "nobody does anything based on the action". Now as I understand memory_order_relaxed, if refs equals N at some point and two threads A and B execute the following code:

control_block_ptr->refs.fetch_add(1, memory_order_relaxed);

then it may happen that both threads see the value of refs to be N and both write N+1 back to it. That will clearly not work and memory_order_acq_rel should be used just as with the destructor. Where am I going wrong?

EDIT1: Consider the following code.

atomic_int refs = N; // at time t0. 

// [Thread 1]
refs.fetch_add(1, memory_order_relaxed); // at time t1. 

// [Thread 2]
n = refs.load(memory_order_relaxed);   // starting at time t2 > t1
refs.fetch_add(1, memory_order_relaxed);
n = refs.load(memory_order_relaxed);

What is the value of refs observed by Thread 2 before the call to fetch_add? Could it be either N or N+1? What is the value of refs observed by Thread 2 after the call to fetch_add? Must it be at least N+2?

[Talk URL: C++ & Beyond 2012 - http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-2-of-2 (@ 1:20:00)]

Rosenberg answered 24/12, 2014 at 3:42 Comment(11)
fetch_add is an atomic operation. Using memory_order_relaxed doesn't change that.Radiochemical
So the only reason we need memory_order_acq_rel in the destructor is so that when the following is run: delete control_block_ptr; it does not run before refs is set to 0?Rosenberg
Yes, basically. With memory_order_relaxed, it would be legal for the compiler to at least start deleting the object before even checking the result of the fetch_sub as long as it had no effect on the visible behavior of the current thread, which would create a data race. For example, it would be legal to do some of the operations of the destructor for the control block, and then undo them if it found that the ref count wasn't zero.Radiochemical
So the fact that it is a fetch_add means I am guaranteed to fetch the latest value, irrespective of the tag (memory_order_relaxed). If one thread issued fetch_add (read-modify-write) with relaxed ordering and another thread issue load (read) with relaxed ordering, then the load could return a value older than that stored by the fetch_add?Rosenberg
It is misleading to think of the fetch_add as giving you the latest value. Being atomic, the fetch_add will do read/modify/write in such a way that you can never observe that they are done as three separate operations, however that is independent of when they are done.Radiochemical
@VaughnCato And "ordering" is about when the atomic operation happens relative to other operations.Cathodoluminescence
@VaughnCato: In other words, fetch_add would need to indicate what the value was when the add was performed, but the compiler could perform the add at any time after it became inevitable?Eula
@VaughnCato "start deleting the object before even checking the result of" Doing the then before the if? That's a patently absurd claim.Pete
@Eula "any time after it became inevitable" That's I believe the way Java describes MT semantics and the way C++ probably wants to, as it's the only way to approach the intent.Pete
@Pete At the level of the abstract machine, operations are performed in the order that they appear in the code, but due to the as-if rule, there is no requirement that the generated assembly do operations in the same order as they appear in the code. That even includes doing some of the "then" work before checking the "if" as long as there is no effect on the observable behavior in the absence of a data race. This is irrelevant technically, but it can help provide a mental model of how memory_order_relaxed can give counterintuitive behavior.Radiochemical
@VaughnCato I have no idea what you are trying to say. What the hell is that "abstract machine" and what is its relation with the hardware? Are you saying that the abstract machine is only sequential? Then what are atomics for?Pete
S
11

Boost.Atomic library that emulates std::atomic provides similar reference counting example and explanation, and it may help your understanding.

Increasing the reference counter can always be done with memory_order_relaxed: New references to an object can only be formed from an existing reference, and passing an existing reference from one thread to another must already provide any required synchronization.

It is important to enforce any possible access to the object in one thread (through an existing reference) to happen before deleting the object in a different thread. This is achieved by a "release" operation after dropping a reference (any access to the object through this reference must obviously happened before), and an "acquire" operation before deleting the object.

It would be possible to use memory_order_acq_rel for the fetch_sub operation, but this results in unneeded "acquire" operations when the reference counter does not yet reach zero and may impose a performance penalty.

Saveall answered 31/12, 2014 at 5:5 Comment(0)
P
4

From C++ reference on std::memory_order:

memory_order_relaxed: Relaxed operation: there are no synchronization or ordering constraints imposed on other reads or writes, only this operation's atomicity is guaranteed

There is also an example below on that page.

So basically, std::atomic::fetch_add() is still atomic, even when with std::memory_order_relaxed, therefore concurrent refs.fetch_add(1, std::memory_order_relaxed) from 2 different threads will always increment refs by 2. The point of the memory order is how other non-atomic or std::memory_order_relaxed atomic operations can be reordered around the current atomic operation with memory order specified.

Premolar answered 2/7, 2017 at 4:36 Comment(7)
Operations aren't really "reordered" as if C++ statements were swapped; it's just the side effects may or may not be guaranteed to be visible.Pete
@curiousguy, actually compliant C++ compilers are allowed to reorder things. See for instance: preshing.com/20120625/memory-ordering-at-compile-time where some examples where this can happen are given.Henigman
@Henigman No. Nothing really "allows" that. There is no official reference for that, just traditions and expectations.Pete
@curiousguy, doesn't the "as if" rule allow for reorderings? That code transformations are allowed as long some constraints are met? This is not retorical, I really thought so, but I'm not sure.Henigman
The so called, non existant "as if" rules "allows" these when they don't matter.Pete
@curiousguy, I think I've got your point. You motivated me to search online and ask a question: https://mcmap.net/q/583689/-where-are-the-statement-of-or-the-foundations-for-the-quot-as-if-quot-rule-in-the-c-standard/1254880 . It seems that there's indeed a sort of a statement of the "as if" rule, but it's buried in a footnote in the current draft of the standard. See this answer to my question: https://mcmap.net/q/583689/-where-are-the-statement-of-or-the-foundations-for-the-quot-as-if-quot-rule-in-the-c-standardHenigman
@Henigman Observable behavior is not a footnote, only the term "as-if rule" and the following explanation are, but the term reordering is indeed misleading because it implies changing the order of operations on the abstract machine. Rather, the implementation, let's say when generating machine code, can output the concrete machine operations in any order that conforms with the program's specified observable behavior based on the abstract machine, even to the point of omitting many non-volatile and non-atomic loads and stores completely.Averil
H
2

As this is rather confusing (at least to me) I'm going to partially address one point:

(...) then it may happen that both threads see the value of refs to be N and both write N+1 back to it (...)

According to @AnthonyWilliams in this answer, the above sentence seems to be wrong as:

The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange(), compare_exchange_strong() or fetch_add(). Read-modify-write operations have an additional constraint that they always operate on the "latest" value, so a sequence of ai.fetch_add(1) operations by a series of threads will return a sequence of values with no duplicates or gaps. In the absence of additional constraints, there's still no guarantee which threads will see which values though.

So, given the authority argument, I'd say it's impossible that both threads see the value going from N to N+1.

Henigman answered 8/12, 2016 at 17:25 Comment(3)
"Latest" just means somewhere in the consistent history of the object; but in practice, the only somewhere where you can add to history is now. That doesn't mean much as there is no global time "now" that you can use to define "latest". It's all very confusing intuitively and formally is even less clear (as the formalism doesn't formalize much).Pete
@curiousguy, I agree with you that this concept of "latest" is rather problematic in this context. But there are some proper use cases for the term nevertheless. For instance when memory_order_seq_cst is given to atomic operations in C++, those operations are required to be given a total order among all threads, which then can be regarded as the foundation to define "latest". Cf en.cppreference.com/w/cpp/atomic/…Henigman
Please explain how you make sense of C and C++ if sequential behavior is not guaranteed. It's obviously the most basic axiom required to have a semantic. The whole thing would not exist w/o sequential execution. Not even Java (a simpler case) came up w/ a satisfactory non sequential MT semantic. And they tried hard! C and C++ never tried.Pete

© 2022 - 2024 — McMap. All rights reserved.