Regarding shared_ptr reference count block

C

3

5

I had 2 questions regarding the std::shared_ptr control block:

(1) Regarding size: How can I programatically find the exact size of the control block for a std::shared_ptr?

(2) Regarding logic: Additionally, boost::shared_ptr mentions that they are completely lock-free with respect to changes in the control block.(Starting with Boost release 1.33.0, shared_ptr uses a lock-free implementation on most common platforms.) I don't think std::shared_ptr follows the same - is this planned for any future C++ version? Doesn't this also mean that boost::shared_ptr is a better idea for multithreaded cases?

Con answered 26/11, 2018 at 3:45 Comment(10)

I'd be happy to revise the question if there's something wrong or unclear about it?(to the person who downvoted it) – Con 26/11, 2018 at 4:3

std::shared_ptr is a template, and, as such, since templates must be visible to the compiler, there's nothing that stops you from investigating your compiler's implementation of std::shared_ptr, and figure this out. – Panacea 26/11, 2018 at 4:7

True, but I'm hoping to do so programatically if possible. – Con 26/11, 2018 at 4:8

You can look at the implementation of std::shared_ptr of your compiler. Other than that, I think std::shared_ptr just uses atomics so is "lock free" in that regard. Note that ideally, you don't share mutable date across threads at all if you can help it and in that case you'll probably want simpler ref counted pointer that doesn't require atomics or locks. – Twinscrew 26/11, 2018 at 4:9

Thanks, yes that's true about generally not needing atomics in the non-multithreaded case. But two points here - (1) Using atomics doesnt technically qualify as "lock-free" right? (2) No implementation provides simple ref-counted non-atomic ops right? – Con 26/11, 2018 at 4:19

@Con Where are these atomic ops not lock free? – Hyacinthe 26/11, 2018 at 13:37

@Hyacinthe - I should be more careful with my wording - What I meant was C++ atomic types are not necessarily lock-free(they may or may not be as is checked by en.cppreference.com/w/cpp/atomic/atomic_is_lock_free). Since std::shared_ptr uses these atomic types so there is no guarantee of being lock-free. – Con 26/11, 2018 at 14:47

@Con There is no guarantee that std::shared_ptr is implemented in term of standard atomic types either. Either way, why do you care? – Hyacinthe 26/11, 2018 at 18:11

Lock-free implementations of concurrent primitives are generally only faster than those using locks in highly-contended settings. Otherwise, they're paying a price in extra operations or synchronization on the uncontended fast path. – Klausenburg 29/11, 2018 at 2:16

@PhilMiller: taking a lock costs at least an atomic RMW (with at least acq/rel ordering) even in the uncontended case. Releasing a spinlock is usually cheap, just an atomic store, but a mutex that can fall back to an OS-assisted sleep often also costs an atomic read-modify-write. Taking/releasing a lock is usually only a win if a lockless version would take multiple atomic operations. Lockless is often very good in low-contention situations. – Safko 22/12, 2018 at 21:28

B

5

(1) Regarding size: How can I programatically find the exact size of the control block for a std::shared_ptr?

There is no way. It's not directly accessible.

(2) Regarding logic: Additionally, boost::shared_ptr mentions that they are completely lock-free with respect to changes in the control block.(Starting with Boost release 1.33.0, shared_ptr uses a lock-free implementation on most common platforms.) I don't think std::shared_ptr follows the same - is this planned for any future C++ version? Doesn't this also mean that boost::shared_ptr is a better idea for multithreaded cases?

Absolutely not. Lock-free implementations are not always better than implementations that use locks. Having an additional constraint, at best, doesn't make the implementation worse but it cannot possibly make the implementation better.

Consider two equally competent programmers each doing their best to implement shared_ptr. One must produce a lock-free implementation. The other is completely free to use their best judgment. There is simply no way the one that must produce a lock-free implementation can produce a better implementation all other things being equal. At best, a lock-free implementation is best and they'll both produce one. At worse, on this platform a lock-free implementation has huge disadvantages and one implementer must use one. Yuck.

Badger answered 26/11, 2018 at 4:26 Comment(3)

Do note that the Boost quote is not "completely lock-free"; it's merely lock-free "on most common platforms". – Whelan 26/11, 2018 at 4:30

@NicolBolas Yeah. It's probably just a statement about what the implementers judged to be best. It's hard to imagine how you would ever need or want a lock on any modern platform -- there's no case where any thread would ever need to wait for any other thread. – Badger 26/11, 2018 at 4:34

C++11 only requires lock-free std::atomic_flag, which is enough to build a lock from, but not enough for lockless ref-counting. Putting a lock-free std::shared_ptr requirement / guarantee in the standard would theoretically restrict which platforms could support a conforming C++11 implementation. I think that's the reason, not that locking might be actually better on a normal platform where lock-free was possible. – Safko 26/11, 2018 at 20:57

L

5

The control block is not exposed. In implementations I have read it is dynamic in size to store the deleter contiguously (and/or, in the case of make shared, the object itself).

In general it contains at least 3 pointer-size fields - weak, strong count, and deleter invoker.

At least one implementation relies on RTTI; others do not.

Operations on the count use atomic operations in the implementations I have read; note that C++ does not require atomic operatins to all be lock free (I believe a platform that doesn't have pointer-size lock-free operations can be a conforming C++ platform).

Their state is are consistent with each other and themselves, but no attempt to make them consistent with object state occurs. This is why using raw shared ptrs as copy on write pImpls may be error prone on some platforms.

Lineman answered 26/11, 2018 at 4:23 Comment(1)

Yup, only std::atomic_flag is guaranteed lock-free. Good implementations on targets that can do it will use lock-free atomics for shared_ptr and atomic<T> up to a width the HW can support, of course. – Safko 26/11, 2018 at 21:0

B

5

(1) Regarding size: How can I programatically find the exact size of the control block for a std::shared_ptr?

There is no way. It's not directly accessible.

(2) Regarding logic: Additionally, boost::shared_ptr mentions that they are completely lock-free with respect to changes in the control block.(Starting with Boost release 1.33.0, shared_ptr uses a lock-free implementation on most common platforms.) I don't think std::shared_ptr follows the same - is this planned for any future C++ version? Doesn't this also mean that boost::shared_ptr is a better idea for multithreaded cases?

Absolutely not. Lock-free implementations are not always better than implementations that use locks. Having an additional constraint, at best, doesn't make the implementation worse but it cannot possibly make the implementation better.

Consider two equally competent programmers each doing their best to implement shared_ptr. One must produce a lock-free implementation. The other is completely free to use their best judgment. There is simply no way the one that must produce a lock-free implementation can produce a better implementation all other things being equal. At best, a lock-free implementation is best and they'll both produce one. At worse, on this platform a lock-free implementation has huge disadvantages and one implementer must use one. Yuck.

Badger answered 26/11, 2018 at 4:26 Comment(3)

Do note that the Boost quote is not "completely lock-free"; it's merely lock-free "on most common platforms". – Whelan 26/11, 2018 at 4:30

@NicolBolas Yeah. It's probably just a statement about what the implementers judged to be best. It's hard to imagine how you would ever need or want a lock on any modern platform -- there's no case where any thread would ever need to wait for any other thread. – Badger 26/11, 2018 at 4:34

C++11 only requires lock-free std::atomic_flag, which is enough to build a lock from, but not enough for lockless ref-counting. Putting a lock-free std::shared_ptr requirement / guarantee in the standard would theoretically restrict which platforms could support a conforming C++11 implementation. I think that's the reason, not that locking might be actually better on a normal platform where lock-free was possible. – Safko 26/11, 2018 at 20:57

B

1

(1) Of course it is best to check implementation, however you still may make some checks from your program.

Control block is allocated dynamically, so to determine its size you may overload new operator.

Then what you may also check is if std::make_shared provides you with some optimization of control block size. In proper implementation I would expect that this will make two allocations (objectA and control block):

std::shared_ptr<A> i(new A());

However this will make only one allocation (and then objectA initialized with placement new):

auto a = std::make_shared<A>();

Consider following example:

#include <iostream>
#include <memory>

void * operator new(size_t size) 
{ 
    std::cout << "Requested allocation: " << size << std::endl; 
    void * p = malloc(size); 
    return p; 
} 

class A {};

class B
{
    int a[8];
};

int main()
{
  std::cout << "Sizeof int: " << sizeof(int) << ", A(empty): " << sizeof(A) << ", B(8 ints): " << sizeof(B) << std::endl;
  {
      std::cout << "Just new:" << std::endl;
      std::cout << "- int:" << std::endl;
      std::shared_ptr<int> i(new int());
      std::cout << "- A(empty):" << std::endl;
      std::shared_ptr<A> a(new A());
      std::cout << "- B(8 ints):" << std::endl;
      std::shared_ptr<B> b(new B());
  }
  {
      std::cout << "Make shared:" << std::endl;
      std::cout << "- int:" << std::endl;
      auto i = std::make_shared<int>();
      std::cout << "- A(empty):" << std::endl;
      auto a = std::make_shared<A>();
      std::cout << "- B(8 ints):" << std::endl;
      auto b = std::make_shared<B>();
  }
}

The output I received (of course it is hw architecture and compiler specific):

Sizeof int: 4, A(empty): 1, B(8 ints): 32
Just new:
- int:
Requested allocation: 4
Requested allocation: 24

First allocation for int - 4 bytes, next one for control block - 24 bytes.

- A(empty):
Requested allocation: 1
Requested allocation: 24
- B(8 ints):
Requested allocation: 32
Requested allocation: 24

Looks that control block is (most probably) 24 bytes.

Here is why to use make_shared:

Make shared:
- int:
Requested allocation: 24

Only one allocation, int + control block = 24 bytes, less then before.

- A(empty):
Requested allocation: 24
- B(8 ints):
Requested allocation: 48

Here one could expect 56 (32+24), but it looks that implementation is optimized. If you use make_shared - pointer to actual object is not needed in control block and its size is only 16 bytes.

Other possibility to check the size of control block is to:

std::cout<< sizeof(std::enable_shared_from_this<int>);

In my case:

So I would say that the size of control block in my case is 16-24 bytes, depending on how it was created.

Botulism answered 22/12, 2018 at 21:4 Comment(0)

Recommended topics

Hot tags