Confusion about implementation error within shared_ptr destructor

Asked 14/2, 2013 at 17:54 Answered 7/7, 2016 at 6:57

I have just seen Herb Sutter's talk: C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 2 of 2

He shows bug in implementation of std::shared_ptr destructor:

if( control_block_ptr->refs.fetch_sub(1, memory_order_relaxed ) == 0 )
    delete control_block_ptr; // B

He says, that due to memory_order_relaxed, delete can be placed before fetch_sub.

At 1:25:18 - Release doesn't keep line B below, where it should be

How that is possible? There is happens-before / sequenced-before relationship, because they are both in single thread. I might be wrong, but there is also carries-a-dependency-to between fetch_sub and delete.

If he is right, which ISO items support that?

Sapless answered 14/2, 2013 at 17:54 Comment(0)

Imagine a code that releases a shared pointer:

auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();

If dec_ref() doesn't have a "release" semantic, it's perfectly fine for a compiler (or CPU) to move things from before dec_ref() to after it (for example):

auto tmp = &(the_ptr->a);
the_ptr.dec_ref();
*tmp = 10;

And this is not safe, since dec_ref() also can be called from other thread in the same time and delete the object. So, it must have a "release" semantic for things before dec_ref() to stay there.

Lets now imagine that object's destructor looks like this:

~object() {
    auto xxx = a;
    printf("%i\n", xxx);
}

Also we will modify example a bit and will have 2 threads:

// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();

// thread 2
the_ptr.dec_ref();

Then, the "aggregated" code will look like:

// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
{ // the_ptr.dec_ref();
    if (0 == atomic_sub(...)) {
        { //~object()
            auto xxx = a;
            printf("%i\n", xxx);
        }
    }
}

// thread 2
{ // the_ptr.dec_ref();
    if (0 == atomic_sub(...)) {
        { //~object()
            auto xxx = a;
            printf("%i\n", xxx);
        }
    }
}

However, if we only have a "release" semantic for atomic_sub(), this code can be optimized that way:

// thread 2
auto xxx = the_ptr->a; // "auto xxx = a;" from destructor moved here
{ // the_ptr.dec_ref();
    if (0 == atomic_sub(...)) {
        { //~object()
            printf("%i\n", xxx);
        }
    }
}

But that way, destructor will not always print the last value of "a" (this code is not race free anymore). That's why we also need acquire semantic for atomic_sub (or, strictly speaking, we need an acquire barrier when counter becomes 0 after decrement).

Dusty answered 10/2, 2015 at 1:7 Comment(8)

Very nice example. This is only a problem for objects with nontrivial dtors that have to work with mutable state of the object, right? So a relaxed atomic is totally safe with regard to the memory deallocation side effect of delete? – Tuneless 10/2, 2015 at 1:43

In other words, "release" semantic is enough for objects with trivial destructors. "delete" itself obviously can't be reordered as a whole operation - only "read" parts can be moved "up" out of it. "write" parts can't be moved "up" because speculative writes are not permitted and that "if" condition should be checked first. – Dusty 10/2, 2015 at 2:33

I get that release is sufficient for trivial dtors, but is relaxed also sufficient? – Tuneless 10/2, 2015 at 3:46

No. First part of the example explains why do we need release. – Dusty 10/2, 2015 at 3:49

Not sure I buy that. Memory ordering doesn't matter for single-threaded programs; it only matters when multiple threads are working with the same memory location. The first part of your answer explains that *tmp could refer to an object that has just been deleted, which I don't think is true given that everything is happening in the same thread. – Tuneless 10/2, 2015 at 3:54

"And this is not safe, since dec_ref() also can be called from other thread in the same time and delete the object." – Dusty 10/2, 2015 at 3:59

*tmp = 10 happens-before the_ptr.dec_ref() by sequencing. If another thread deletes the object, then the_ptr.dec_ref() becomes visible everywhere before the other thread's dec_ref(). The other thread's dec_ref() happens-before the delete by sequencing. The spec does appear to declare the interaction of delete and *tmp = 10 a data race because "becomes visible everywhere before" is not included in the definition of "inter-thread happens-before." Are you certain that this is an accurate reading of the spec? – Tuneless 10/2, 2015 at 4:8

Why *tmp = 10 will happen-before the_ptr.dec_ref() if relaxed is used? With relaxed *tmp = 10 can become visible to other threads after those threads already observed results of the_ptr.dec_ref(). – Dusty 6/4, 2015 at 22:49

Looks like he is talking about synchronization of actions on shared object itself, which are not shown on his code blocks (and as the result - confusing).

That's why he put acq_rel - because all actions on the object should happens before its destruction, all in order.

But I'm still not sure why he talks about swapping delete with fetch_sub.

Sapless answered 14/2, 2013 at 19:23 Comment(3)

Yes, the main reason an acquire-release operation is needed is for inter-thread synchronization, he only mentions the code motion in passing, and doesn't explain it – Ermine 14/2, 2013 at 19:40

"needed is for inter-thread synchronization" - it is needed for code not shown on the slide... – Sapless 14/2, 2013 at 19:43

Yes, only part of it's shown: the control block might contain a custom deleter, and the deleter might get invoked in a different thread, so the control block must not get destroyed until the deleter has finished destroying the object. – Ermine 14/2, 2013 at 19:53

This is a late reply.

Let's start out with this simple type:

struct foo
{
    ~foo() { std::cout << value; }
    int value;
};

And we'll use this type in a shared_ptr, as follows:

void runs_in_separate_thread(std::shared_ptr<foo> my_ptr)
{
    my_ptr->value = 5;
    my_ptr.reset();
}

int main()
{
    std::shared_ptr<foo> my_ptr(new foo);
    std::async(std::launch::async, runs_in_separate_thread, my_ptr);
    my_ptr.reset();
}

Two threads will be running in parallel, both sharing ownership of a foo object.

With a correct shared_ptr implementation (that is, one with memory_order_acq_rel), this program has defined behavior. The only value that this program will print is 5.

With an incorrect implementation (using memory_order_relaxed) there are no such guarantees. The behavior is undefined because a data race of foo::value is introduced. The trouble occurs only for cases when the destructor gets called in the main thread. With a relaxed memory order, the write to foo::value in the other thread may not propagate to the destructor in the main thread. A value other than 5 could be printed.

So what's a data race? Well, check out the definition and pay attention to the last bullet point:

When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless either

both conflicting evaluations are atomic operations (see std::atomic)

one of the conflicting evaluations happens-before another (see std::memory_order)

In our program, one thread will write to foo::value and one thread will read from foo::value. These are supposed to be sequential; the write to foo::value should always happen before the read. Intuitively, it makes sense that they would be as the destructor is supposed to be the last thing that happens to an object.

memory_order_relaxed does not offer such ordering guarantees though and so memory_order_acq_rel is required.

Salientian answered 7/7, 2016 at 6:57 Comment(0)

In the talk Herb shows memory_order_release not memory_order_relaxed, but relaxed would have even more problems.

Unless delete control_block_ptr accesses control_block_ptr->refs (which it probably doesn't) then the atomic operation does not carry-a-dependency-to the delete. The delete operation might not touch any memory in the control block, it might just return that pointer to the freestore allocator.

But I'm not sure if Herb is talking about the compiler moving the delete before the atomic operation, or just referring to when the side effects become visible to other threads.

Ermine answered 14/2, 2013 at 18:53 Comment(10)

" talking about the compiler moving the delete before the atomic operation" - 1:23:34: "code stay below and above" ;;; "or just referring to when the side effects become visible to other threads." - which side-effects? read-modify-write everytime see last value in modification order – Sapless 14/2, 2013 at 19:4

"but relaxed would have even more problems." - which problems? – Sapless 14/2, 2013 at 19:4

"which problems?" A relaxed operation is not a synchronization operation at all. – Ermine 14/2, 2013 at 19:6

"which side-effects? read-modify-write everytime see last value in modification order" but as I said in my answer, the atomic operation and the delete do not access the same memory location, they are accessing distinct objects. – Ermine 14/2, 2013 at 19:7

"A relaxed operation is not a synchronization operation at all." - and which synchronization you need here? – Sapless 14/2, 2013 at 19:7

"hey are accessing distinct objects", maybe yes, there is no carry-a-dependency-to, but there is still happens-before – Sapless 14/2, 2013 at 19:9

There is a happens before relationship, but the standard says the observable results of the abstract machine are the operations on volatiles and atomic operations, since delete control_block_ptr is neither, in the absence of a memory barrier to prevent code motion the compiler can move it before the decrement, and you couldn't observe the difference... except that here it would cause a bug, but the compiler doesn't know that. A release operation does not prevent later operations being moved to before the release operation. – Ermine 14/2, 2013 at 19:37

Well, lets make fetch-sub even non-atomic, normal decrement, lets consider single thread. How compiler can move delete unconditionaly before if? How it can "delete in advance"? What it will do, if "if" test would fail? Will it resurect object? – Sapless 14/2, 2013 at 19:41

But fetch_sub, being an RMW operation, always loads the last value in the modification order. Also, within a single thread happens-before is basically guaranteed by sequenced-before. Furthemore memory_order_release guarantees that no memory operation can move below it, so all memory operations must be completed before the fetch_sub within a single thread . So the delete can only run when previous threads already reached and completed their fetch_sub, which implies that they also completed their memory operations. Am I missing something? – Litchi 26/12, 2021 at 14:22

Since fetch_sub always loads the last value in the modification order, delete can only run when other threads finished all access (which is guaranteed by their fetch_sub with memory_order_release). – Litchi 26/12, 2021 at 14:22

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags