Confusion about implementation error within shared_ptr destructor
Asked Answered
S

4

12

I have just seen Herb Sutter's talk: C++ and Beyond 2012: Herb Sutter - atomic<> Weapons, 2 of 2

He shows bug in implementation of std::shared_ptr destructor:

if( control_block_ptr->refs.fetch_sub(1, memory_order_relaxed ) == 0 )
    delete control_block_ptr; // B

He says, that due to memory_order_relaxed, delete can be placed before fetch_sub.

At 1:25:18 - Release doesn't keep line B below, where it should be

How that is possible? There is happens-before / sequenced-before relationship, because they are both in single thread. I might be wrong, but there is also carries-a-dependency-to between fetch_sub and delete.

If he is right, which ISO items support that?

Sapless answered 14/2, 2013 at 17:54 Comment(0)
D
2

Imagine a code that releases a shared pointer:

auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();

If dec_ref() doesn't have a "release" semantic, it's perfectly fine for a compiler (or CPU) to move things from before dec_ref() to after it (for example):

auto tmp = &(the_ptr->a);
the_ptr.dec_ref();
*tmp = 10;

And this is not safe, since dec_ref() also can be called from other thread in the same time and delete the object. So, it must have a "release" semantic for things before dec_ref() to stay there.

Lets now imagine that object's destructor looks like this:

~object() {
    auto xxx = a;
    printf("%i\n", xxx);
}

Also we will modify example a bit and will have 2 threads:

// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
the_ptr.dec_ref();

// thread 2
the_ptr.dec_ref();

Then, the "aggregated" code will look like:

// thread 1
auto tmp = &(the_ptr->a);
*tmp = 10;
{ // the_ptr.dec_ref();
    if (0 == atomic_sub(...)) {
        { //~object()
            auto xxx = a;
            printf("%i\n", xxx);
        }
    }
}

// thread 2
{ // the_ptr.dec_ref();
    if (0 == atomic_sub(...)) {
        { //~object()
            auto xxx = a;
            printf("%i\n", xxx);
        }
    }
}

However, if we only have a "release" semantic for atomic_sub(), this code can be optimized that way:

// thread 2
auto xxx = the_ptr->a; // "auto xxx = a;" from destructor moved here
{ // the_ptr.dec_ref();
    if (0 == atomic_sub(...)) {
        { //~object()
            printf("%i\n", xxx);
        }
    }
}

But that way, destructor will not always print the last value of "a" (this code is not race free anymore). That's why we also need acquire semantic for atomic_sub (or, strictly speaking, we need an acquire barrier when counter becomes 0 after decrement).

Dusty answered 10/2, 2015 at 1:7 Comment(8)
Very nice example. This is only a problem for objects with nontrivial dtors that have to work with mutable state of the object, right? So a relaxed atomic is totally safe with regard to the memory deallocation side effect of delete?Tuneless
In other words, "release" semantic is enough for objects with trivial destructors. "delete" itself obviously can't be reordered as a whole operation - only "read" parts can be moved "up" out of it. "write" parts can't be moved "up" because speculative writes are not permitted and that "if" condition should be checked first.Dusty
I get that release is sufficient for trivial dtors, but is relaxed also sufficient?Tuneless
No. First part of the example explains why do we need release.Dusty
Not sure I buy that. Memory ordering doesn't matter for single-threaded programs; it only matters when multiple threads are working with the same memory location. The first part of your answer explains that *tmp could refer to an object that has just been deleted, which I don't think is true given that everything is happening in the same thread.Tuneless
"And this is not safe, since dec_ref() also can be called from other thread in the same time and delete the object."Dusty
*tmp = 10 happens-before the_ptr.dec_ref() by sequencing. If another thread deletes the object, then the_ptr.dec_ref() becomes visible everywhere before the other thread's dec_ref(). The other thread's dec_ref() happens-before the delete by sequencing. The spec does appear to declare the interaction of delete and *tmp = 10 a data race because "becomes visible everywhere before" is not included in the definition of "inter-thread happens-before." Are you certain that this is an accurate reading of the spec?Tuneless
Why *tmp = 10 will happen-before the_ptr.dec_ref() if relaxed is used? With relaxed *tmp = 10 can become visible to other threads after those threads already observed results of the_ptr.dec_ref().Dusty
S
1

Looks like he is talking about synchronization of actions on shared object itself, which are not shown on his code blocks (and as the result - confusing).

That's why he put acq_rel - because all actions on the object should happens before its destruction, all in order.

But I'm still not sure why he talks about swapping delete with fetch_sub.

Sapless answered 14/2, 2013 at 19:23 Comment(3)
Yes, the main reason an acquire-release operation is needed is for inter-thread synchronization, he only mentions the code motion in passing, and doesn't explain itErmine
"needed is for inter-thread synchronization" - it is needed for code not shown on the slide...Sapless
Yes, only part of it's shown: the control block might contain a custom deleter, and the deleter might get invoked in a different thread, so the control block must not get destroyed until the deleter has finished destroying the object.Ermine
S
1

This is a late reply.

Let's start out with this simple type:

struct foo
{
    ~foo() { std::cout << value; }
    int value;
};

And we'll use this type in a shared_ptr, as follows:

void runs_in_separate_thread(std::shared_ptr<foo> my_ptr)
{
    my_ptr->value = 5;
    my_ptr.reset();
}

int main()
{
    std::shared_ptr<foo> my_ptr(new foo);
    std::async(std::launch::async, runs_in_separate_thread, my_ptr);
    my_ptr.reset();
}

Two threads will be running in parallel, both sharing ownership of a foo object.

With a correct shared_ptr implementation (that is, one with memory_order_acq_rel), this program has defined behavior. The only value that this program will print is 5.

With an incorrect implementation (using memory_order_relaxed) there are no such guarantees. The behavior is undefined because a data race of foo::value is introduced. The trouble occurs only for cases when the destructor gets called in the main thread. With a relaxed memory order, the write to foo::value in the other thread may not propagate to the destructor in the main thread. A value other than 5 could be printed.

So what's a data race? Well, check out the definition and pay attention to the last bullet point:

When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless either

  • both conflicting evaluations are atomic operations (see std::atomic)
  • one of the conflicting evaluations happens-before another (see std::memory_order)

In our program, one thread will write to foo::value and one thread will read from foo::value. These are supposed to be sequential; the write to foo::value should always happen before the read. Intuitively, it makes sense that they would be as the destructor is supposed to be the last thing that happens to an object.

memory_order_relaxed does not offer such ordering guarantees though and so memory_order_acq_rel is required.

Salientian answered 7/7, 2016 at 6:57 Comment(0)
E
0

In the talk Herb shows memory_order_release not memory_order_relaxed, but relaxed would have even more problems.

Unless delete control_block_ptr accesses control_block_ptr->refs (which it probably doesn't) then the atomic operation does not carry-a-dependency-to the delete. The delete operation might not touch any memory in the control block, it might just return that pointer to the freestore allocator.

But I'm not sure if Herb is talking about the compiler moving the delete before the atomic operation, or just referring to when the side effects become visible to other threads.

Ermine answered 14/2, 2013 at 18:53 Comment(10)
" talking about the compiler moving the delete before the atomic operation" - 1:23:34: "code stay below and above" ;;; "or just referring to when the side effects become visible to other threads." - which side-effects? read-modify-write everytime see last value in modification orderSapless
"but relaxed would have even more problems." - which problems?Sapless
"which problems?" A relaxed operation is not a synchronization operation at all.Ermine
"which side-effects? read-modify-write everytime see last value in modification order" but as I said in my answer, the atomic operation and the delete do not access the same memory location, they are accessing distinct objects.Ermine
"A relaxed operation is not a synchronization operation at all." - and which synchronization you need here?Sapless
"hey are accessing distinct objects", maybe yes, there is no carry-a-dependency-to, but there is still happens-beforeSapless
There is a happens before relationship, but the standard says the observable results of the abstract machine are the operations on volatiles and atomic operations, since delete control_block_ptr is neither, in the absence of a memory barrier to prevent code motion the compiler can move it before the decrement, and you couldn't observe the difference... except that here it would cause a bug, but the compiler doesn't know that. A release operation does not prevent later operations being moved to before the release operation.Ermine
Well, lets make fetch-sub even non-atomic, normal decrement, lets consider single thread. How compiler can move delete unconditionaly before if? How it can "delete in advance"? What it will do, if "if" test would fail? Will it resurect object?Sapless
But fetch_sub, being an RMW operation, always loads the last value in the modification order. Also, within a single thread happens-before is basically guaranteed by sequenced-before. Furthemore memory_order_release guarantees that no memory operation can move below it, so all memory operations must be completed before the fetch_sub within a single thread . So the delete can only run when previous threads already reached and completed their fetch_sub, which implies that they also completed their memory operations. Am I missing something?Litchi
Since fetch_sub always loads the last value in the modification order, delete can only run when other threads finished all access (which is guaranteed by their fetch_sub with memory_order_release).Litchi

© 2022 - 2024 — McMap. All rights reserved.