Is make_shared really more efficient than new?
Asked Answered
T

4

58

I was experimenting with shared_ptr and make_shared from C++11 and programmed a little toy example to see what is actually happening when calling make_shared. As infrastructure I was using llvm/clang 3.0 along with the llvm std c++ library within XCode4.

class Object
{
public:
    Object(const string& str)
    {
        cout << "Constructor " << str << endl;
    }

    Object()
    {
        cout << "Default constructor" << endl;

    }

    ~Object()
    {
        cout << "Destructor" << endl;
    }

    Object(const Object& rhs)
    {
        cout << "Copy constructor..." << endl;
    }
};

void make_shared_example()
{
    cout << "Create smart_ptr using make_shared..." << endl;
    auto ptr_res1 = make_shared<Object>("make_shared");
    cout << "Create smart_ptr using make_shared: done." << endl;

    cout << "Create smart_ptr using new..." << endl;
    shared_ptr<Object> ptr_res2(new Object("new"));
    cout << "Create smart_ptr using new: done." << endl;
}

Now have a look at the output, please:

Create smart_ptr using make_shared...

Constructor make_shared

Copy constructor...

Copy constructor...

Destructor

Destructor

Create smart_ptr using make_shared: done.

Create smart_ptr using new...

Constructor new

Create smart_ptr using new: done.

Destructor

Destructor

It appears that make_shared is calling the copy constructor two times. If I allocate memory for an Object using a regular new this does not happen, only one Object is constructed.

What I am wondering about is the following. I heard that make_shared is supposed to be more efficient than using new(1, 2). One reason is because make_shared allocates the reference count together with the object to be managed in the same block of memory. OK, I got the point. This is of course more efficient than two separate allocation operations.

On the contrary I don't understand why this has to come with the cost of two calls to the copy constructor of Object. Because of this I am not convinced that make_shared is more efficient than allocation using new in every case. Am I wrong here? Well OK, One could implement a move constructor for Object but still I am not sure whether this this is more efficient than just allocating Object through new. At least not in every case. It would be true if copying Object is less expensive than allocating memory for a reference counter. But the shared_ptr-internal reference counter could be implemented using a couple of primitive data types, right?

Can you help and explain why make_shared is the way to go in terms of efficiency, despite the outlined copy overhead?

Traceytrachea answered 15/2, 2012 at 22:12 Comment(11)
What do you think auto means?Kannry
What compile options did you use? Are all those copy constructor calls still there when you enable optimization?Hsinking
Before you test the speed of a C++11 function, you should probably implement a move constructor in your class and make that fully C++11ish.Chavarria
That code doesn't match the output. The code you show leaks.Hsinking
To echo @peachykeen, by adding a copy constructor you inhibited the automatic generation of a move constructor. If you want a proper test, you should add a move constructor and move-assignment operator yourself.Ahoufe
I don't believe your output. With GCC 4.6.2, I get this much saner output.Kannry
What difference would that make, @Ildjarn? Nothing in this code should be triggering a move or a copy of any Object instance. The code simply doesn't match the reported output.Nidifugous
@Rob : "Should" aside, without a move constructor there may be copies that occur that would have otherwise been moves, so counting copies without a move constructor is pointless.Ahoufe
and what @Ahoufe said is particularly important when you're using C++11 STL functions, as they may be optimized when the appropriate C++11 features/methods/ctors are present (my original point). Testing half C++11 and half C++03 is going to break the optimizations in both.Chavarria
@peachykeen @lidjam I think what @Rob is saying is that only the pointer (smart or otherwise) needs to be copy/move constructed in the code provided, not the object. The compiler doesn't ever need to consider whether copy elision or move semantics should apply to the instance of Object. This can be shown by making the copy/move constructors private. Bottom line is that the code doesn't match the output. @Traceytrachea Either you've discovered a massive library or compiler defect or you're leaving something out.Grog
I've edited the question to modify the code in the original block. On Stack Overflow, rather than making a new second with edits it's generally accepted to just change the post -- if people want to see how the post has been edited, they can always see the revision history. Separate EDIT blocks make the post harder to read, in general; in this specific case, it's not much, but just editing the original code block to be fixed doesn't hurt anything and makes things easier to read and grok as you go instead of having to go back and reread with the edit in mind.Bresnahan
C
43

As infrastructure I was using llvm/clang 3.0 along with the llvm std c++ library within XCode4.

Well that appears to be your problem. The C++11 standard states the following requirements for make_shared<T> (and allocate_shared<T>), in section 20.7.2.2.6:

Requires: The expression ::new (pv) T(std::forward(args)...), where pv has type void* and points to storage suitable to hold an object of type T, shall be well formed. A shall be an allocator (17.6.3.5). The copy constructor and destructor of A shall not throw exceptions.

T is not required to be copy-constructable. Indeed, T isn't even required to be non-placement-new constructable. It is only required to be constructable in-place. This means that the only thing that make_shared<T> can do with T is new it in-place.

So the results you get are not consistent with the standard. LLVM's libc++ is broken in this regard. File a bug report.

For reference, here's what happened when I took your code into VC2010:

Create smart_ptr using make_shared...
Constructor make_shared
Create smart_ptr using make_shared: done.
Create smart_ptr using new...
Constructor new
Create smart_ptr using new: done.
Destructor
Destructor

I also ported it to Boost's original shared_ptr and make_shared, and I got the same thing as VC2010.

I'd suggest filing a bug report, as libc++'s behavior is broken.

Centigram answered 15/2, 2012 at 23:54 Comment(5)
"The results you get are entirely consistent with what the C++ standard allows." I don't see anything in the code that should cause the instance of Object to be copy/move constructed (regardless of whether or not the compiler opts to elide such a construction.)Grog
@AndrewDurward: Actually, you're right and wrong. The standard's requirements on make_shared<T> do not state that T must be copy constructable. Therefore, make_shared<T> cannot call the copy constructor. You're wrong in that if the standard did allow T to be copy constructable that an implementation of make_shared<T> could call it.Centigram
@NicolBolas: Thanks for the bug report against libc++. I agree with your analysis. This has been fixed in the libc++ public svn trunk and the copy constructor is no longer called.Snashall
I have copy constructor implicitly deleted because I defined a user-declared move constructor. Now clang complains about make_shared call to implicitly-deleted copy constructor. So if make_shared does not need the copy constructor, is this a bug?Finality
I was passing a temporary to make_shared like std:make_shared(M(..)) changed it ot std:make_shared(std:move(M(..))) and it is good now.Finality
K
34

You have to compare these two versions:

std::shared_ptr<Object> p1 = std::make_shared<Object>("foo");
std::shared_ptr<Object> p2(new Object("foo"));

In your code, the second variable is just a naked pointer, not a shared pointer at all.


Now on the meat. make_shared is (in practice) more efficient, because it allocates the reference control block together with the actual object in one single dynamic allocation. By contrast, the constructor for shared_ptr that takes a naked object pointer must allocate another dynamic variable for the reference count. The trade-off is that make_shared (or its cousin allocate_shared) does not allow you to specify a custom deleter, since the allocation is performed by the allocator.

(This does not affect the construction of the object itself. From Object's perspective there is no difference between the two versions. What's more efficient is the shared pointer itself, not the managed object.)

Kannry answered 15/2, 2012 at 22:17 Comment(7)
"make_shared is (in practice) more efficient, because it allocates the reference control block together with the actual object in one single dynamic allocation". I think that is only the case for VS2012, they do this optimization, but linux std-libs don't do that optimization (yet?).Hercules
@Ela782: Yes, GCC has done so for some time. This is explicitly recommended by the standard in 20.7.2.2.6/6.Kannry
So this efficiency only happens in initialization of shared_ptr?Is there any efficiency in latter usage of ptr created by make_shared?Intoxication
@bigxiao: There shouldn't be. A good implementation will store the actual pointer to the object at the beginning of the shared_ptr regardless of how the shared_ptr was created, so that dereferencing never requires more computation than it would for a raw pointer. The final deallocation differs, of course (especially in the presence of weak pointers).Kannry
@KerrekSB make_shared stores the object and ref countings together(in the same control block),so maybe there are less cache miss.Intoxication
@bigxiao: Sure, but the most important operation of a smart pointer, i.e. dereferencing, doesn't require access of the control block.Kannry
"...the second variable is just a naked pointer, not a shared pointer at all." Wait what? Really? How can that be?Redtop
U
6

So one thing to keep in mind is your optimization settings. Measuring performance, particularly with regard to c++ is meaningless without optimizations enabled. I don't know if you did in fact compile with optimizations, so I thought it was worth mentioning.

That said, what you are measuring with this test is not a way that make_shared is more efficient. Simply put, you are measuring the wrong thing :-P.

Here's the deal. Normally, when you create shared pointer, it has at least 2 data members (possibly more). One for the pointer, and one for the reference count. This reference count is allocated on the heap (so that it can be shared among shared_ptr with different lifetimes...that's the point after all!)

So if you are creating an object with something like std::shared_ptr<Object> p2(new Object("foo")); There are at least 2 calls to new. One for Object and one for the reference count object.

make_shared has the option (i'm not sure it has to), to do a single new which is big enough to hold the object pointed to and the reference count in the same contiguous block. Effectively allocating an object that looks something like this (illustrative, not literally what it is).

struct T {
    int reference_count;
    Object object;
};

Since the reference count and the object's lifetimes are tied together (it doesn't make sense for one to live longer than the other). This whole block can be deleted at the same time as well.

So the efficiency is in allocations, not in copying (which I suspect had to do with optimization more than anything else).

To be clear, this is what boost has to say on about make_shared

http://www.boost.org/doc/libs/1_43_0/libs/smart_ptr/make_shared.html

Besides convenience and style, such a function is also exception safe and considerably faster because it can use a single allocation for both the object and its corresponding control block, eliminating a significant portion of shared_ptr's construction overhead. This eliminates one of the major efficiency complaints about shared_ptr.

Unwinking answered 15/2, 2012 at 22:25 Comment(1)
A minor nit, the Object instance can die before the refc control block i.e. if weak_ptrs are used. This isn't a problem, except the minor issue that memory directly held by Object's layout is not reclaimed until the control block dies too; in an ordinary shared_ptr the object's heap block can be reclaimed as soon as it expires.Morrissette
B
3

You should not be getting any extra copies there. The output should be:

Create smart_ptr using make_shared...
Constructor make_shared
Create smart_ptr using make_shared: done.
Create smart_ptr using new...
Constructor new
Create smart_ptr using new: done.
Destructor

I don't know why you're getting extra copies. (though I see you're getting one 'Destructor' too many, so the code you used to get your output must be different from the code you posted)

make_shared is more efficient because it can be implemented using only one dynamic allocation instead of two, and because it needs one pointer's worth of memory less book-keeping per shared object.

Edit: I didn't check with Xcode 4.2 but with Xcode 4.3 I get the correct output I show above, not the incorrect output shown in the question.

Bullyrag answered 15/2, 2012 at 22:25 Comment(1)
Talk about good timing! ;-) Thanks for the Xcode 4.3 report.Snashall

© 2022 - 2024 — McMap. All rights reserved.