Why does `std::make_shared` perform two separate allocations with `-fno-rtti`?
Asked Answered
G

3

25
#include <memory>
struct foo { };
int main() { std::make_shared<foo>(); }

The asssembly generated by both g++7 and clang++5 with -fno-exceptions -Ofast for the code above:

  • Contains a single call to operator new if -fno-rtti is not passed.

  • Contains two separate calls to operator new if -fno-rtti is passed.

This can be easily verified on gcc.godbolt.org (clang++5 version):

screenshot of the above godbolt link with highlighed operator new calls

Why is this happening? Why does disabling RTTI prevent make_shared from unifying the object and control block allocations?

Gavrah answered 29/3, 2017 at 12:26 Comment(5)
related: https://mcmap.net/q/539453/-shared_ptr-without-rttiLem
since you've disabled virtual functions the library can't use packed structure (the element, the refcount and the deleter), since that needs type-erasure. so the library needs to allocate the element+ refcount and the deleter separately.Gastrectomy
this is also a great example toward "no rtti + no exceptions yields the fastest C++ code" which some developers insist on thinking. here an example which proves that rtti can actually generate better code.Gastrectomy
No virtual functions were disabled; this is a QoI failure on libstdc++'s behalf. libc++ does the optimized implementation in both cases.Superadd
Do you get 3 allocations with std::shared_ptr<foo>(new foo())? No, just 2. Hmm.Miasma
M
6

No good reason. This looks like a QoI issue in libstdc++.

Using clang 4.0, libc++ does not have this issue., while libstdc++ does.

The libstdc++ implementation with RTTI relies on get_deleter:

void* __p = _M_refcount._M_get_deleter(typeid(__tag));
                  _M_ptr = static_cast<_Tp*>(__p);
                  __enable_shared_from_this_helper(_M_refcount, _M_ptr, _M_ptr);
_M_ptr = static_cast<_Tp*>(__p);

and in general, get_deleter isn't possible to implement without RTTI.

It appears that it is using the deleters position and the tag to store the T in this implementation.

Basically, the RTTI version used get_deleter. get_deleter relied on RTTI. Getting make_shared to work without RTTI required rewriting it, and they took an easy route that caused it to do two allocations.

make_shared unifies the T and reference counting blocks. I suppose with both variable sized deleters and variable sized T things get nasty, so they reused the deleter's variable sized block to store the T.

A modified (internal) get_deleter that did not do RTTI and returned a void* might be enough to do what they need from this deleter; but possibly not.

Miasma answered 30/3, 2017 at 15:42 Comment(0)
M
12

Why does disabling RTTI prevent make_shared from unifying the object and control block allocations?

You can see from the assembler (just pasting the text is really preferable to both linking and to taking pictures of it) that the unified version doesn't allocate a simple foo but an std::_Sp_counted_ptr_inplace, and further that that type has a vtable (recall it needs a virtual destructor in general, to cope with custom deleters)

mov QWORD PTR [rax], OFFSET FLAT:
  vtable for
  std::_Sp_counted_ptr_inplace<foo, std::allocator<foo>,
  (__gnu_cxx::_Lock_policy)2>+16

If you disable RTTI, it can't generate the inplace counted pointer because that needs to be virtual.

Note that the non-inplace version still refers to a vtable, but it seems to just be storing the de-virtualized destructor address directly.

Mariannamarianne answered 29/3, 2017 at 12:41 Comment(14)
"just pasting the text is really preferable to both linking and to taking pictures of it" - the text is available in the godbolt, which is linked right above the snapshot. Pasting the generated assembly in the OP would be a bad idea in my opinion, as having to click on either the screenshot or the link is way better than having to scroll through a lot of assembly IMHO. Also, should this be considered a QoI ("quality of implementation") issue?Gavrah
Yeah, the text is available on godbolt ... now. But who knows whether the link will break while this site is still up? And, yes I think it's a QoI implementation, in the sense that it's very generous the implementers wrote a second version for no-rtti at all. I wouldn't blame them for just saying shared_ptr isn't implementable without RTTI.Mariannamarianne
@VittorioRomeo imgur.com is filtered where I work ;)Lem
"(recall it needs a virtual destructor in general, to cope with custom deleters" -- custom deleters are not possible with make_shared?Miasma
But make_shared returns a shared_ptr which is parameterized on the deleter, even if that param is defaulted. I guess you could have a specialization just for shared_ptr of a trivially-destructible type when using the default deleter and compiling with no-rtti , but that seems like a lot of effort.Mariannamarianne
If you use libc++ it's a single allocation even with rtti disabled. You can do everything with -fno-rtti you could have done without, other than use dynamic_cast or typeid.Superadd
Is it the same size allocation? For stored types both with and without real destructors? And with both custom and default deleters? If so, they've put some real effort into optimizing the no-rtti case.Mariannamarianne
No, shared_ptr is not parameteraized on the deleter (the template). Instances do have a deleter, but make_shared must create its own deleter that destroys the T without deallocating it. And with make_shared you know the exact type you are destroying. The destroyer needing RTTI is ridiculous.Miasma
Oh, you are saying that the control block has a virtual destroyer, instead of just storing a type-erased destroyer always and putting non-custom deleters into it.Miasma
Oh yeah, I expressed that really badly. Actually, it looks more complicated than I assumed - both the GCC and LLVM sources use the virtual keyword even with RTTI turned off, but by default also use the typeid of the deleter in some way I haven't yet thought through.Mariannamarianne
You seem to be of the mistaken belief that -fno-rtti prevents virtual dispatch.Condolent
You mean because I specifically mentioned in the comment above that both implementations do still use virtual dispatch with RTTI turned off? I haven't refactored the answer to explain that correctly yet, but you should of course feel free to write a better answer yourself.Mariannamarianne
Yes, partly (uses virtual keyword is not quite the same as uses virtual dispatch). I think @Yakk already did.Condolent
@Condolent I think I decoded what the get_deleter is for down below. Basically, the memory layout of the reference counting block is something like {strong_count, weak_count, ???, deleter}. The deleter may be variable in size? And for make_shared, they somehow reuse the deleter object storage to store the object instead; maybe in ??? there is a void(*)(void*) stateless invoker of deleter object. The get_deleter relying on the typeid here is accidental, not central.Miasma
B
11

Naturally, std::shared_ptr will be implemented with the assumption of the compiler supporting rtti. But it can be implemented without it. See shared_ptr without RTTI? .

Taking a cue from this old GCC's libstdc++ #42019 bug. We can see that Jonathan Wakely added a fix to make this possible without RTTI.

In GCC's libstdc++, std::make_shared uses the services of std::allocated_shared which uses a non-standard constructor(as seen in the code, reproduced below).

As seen in this patch, from line 753, you can see that getting the default deleter simply requires using the services of typeid if RTTI is enabled, otherwise, it requires a separate allocation that doesn't depend on RTTI.

EDIT: 9 - May -2017: removed copyrighted code previously posted here

I haven't investigated libcxx, but I want to believe they did similar thing....

Buckwheat answered 29/3, 2017 at 13:47 Comment(4)
That is copyrighted code so posting it on SO violates the terms of service. The code is available to browse online (with the required copyright headers) so you could link to it instead.Interfertile
@JonathanWakely. Wow! I had no idea. Thanks for the info. I've removed the code. I thought, provided relevant citation and acknowledgement is made, one could freely post snippets from an "OpenSource" licensed library. More especially GLPv3 code.. So, my presumption is wrong then?Buckwheat
@JonathanWakely I checked the TOS. It only prohibits content that infringes a copyright. Is it really your position that taking an excerpt of a copyrighted, but freely available at no cost, work for educational purposes and in a way that has no impact on the market for the work infringes its copyright? Because, if so, you might want to have a look at 17 USC 107.Malinowski
The TOS says "You agree that all Subscriber Content that You contribute to the Network is perpetually and irrevocably licensed to Stack Exchange under the Creative Commons Attribution Share Alike license." Maybe I'm wrong, but pasting significant chunks of someone else's copyrighted code to a network that imposes its own terms on that content seems wrong. My objection is SO's "all your content belongs to us" policy, not to anybody using libstdc++ code for educational purposes. SO is a business, and its content gets replicated by some far more shady networks that just suck in content from SO.Interfertile
M
6

No good reason. This looks like a QoI issue in libstdc++.

Using clang 4.0, libc++ does not have this issue., while libstdc++ does.

The libstdc++ implementation with RTTI relies on get_deleter:

void* __p = _M_refcount._M_get_deleter(typeid(__tag));
                  _M_ptr = static_cast<_Tp*>(__p);
                  __enable_shared_from_this_helper(_M_refcount, _M_ptr, _M_ptr);
_M_ptr = static_cast<_Tp*>(__p);

and in general, get_deleter isn't possible to implement without RTTI.

It appears that it is using the deleters position and the tag to store the T in this implementation.

Basically, the RTTI version used get_deleter. get_deleter relied on RTTI. Getting make_shared to work without RTTI required rewriting it, and they took an easy route that caused it to do two allocations.

make_shared unifies the T and reference counting blocks. I suppose with both variable sized deleters and variable sized T things get nasty, so they reused the deleter's variable sized block to store the T.

A modified (internal) get_deleter that did not do RTTI and returned a void* might be enough to do what they need from this deleter; but possibly not.

Miasma answered 30/3, 2017 at 15:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.