Why is sizeof(string) == 32?
Asked Answered
T

6

41

What is the overhead in the string structure that causes sizeof() to be 32 ?

Trailer answered 22/9, 2010 at 15:19 Comment(9)
If you open your platform's <string> header, you can see exactly why std::string is that size. @Queso: sizeof yields the size of an object in bytes.Milkfish
If sizeof returns the number of bits in the pointer then your compiler is brokenCongius
@Queso: sizeof() returns bytes, not bits. A 32-byte pointer is a 256-bit addressPresent
@Martin: because "wetness" is pretty much defined as being a property of water (or anyway of liquids). I'm not aware that "32" is defined as being the size of a string.Melanie
@Steve Jessop: Water is wet because of the current implementation (earth as a STP that allows water to be liquid). In other implementations it is not wet (like Jupiter where it is a gas). So this string implementation is 32 because that's the way it was built in this implementation and it will by 16 in other implementations and 64 in yet another. The size of the string will (like water) depend on the environment it is used in.Thremmatology
OK, so water is liquid because of the Earth's STP, and we could drill further by looking to the factors which affect that (atmospheric pressure is affected by mass and gas emission, for example). So what does asking why one set of implementers chose 32, and another chose 64, have to do with asking why the earth has a certain surface pressure and temperature? One is a choice made by a sentient being. The other is IMO not, but even if IYO it is, I don't think C++ implementers have quite the claim to ineffability that God does.Melanie
@ Steve Jessop: Note: The whole point we all decided to be programmers is so that we can feel like GOD (the architects of our own little Universe).Thremmatology
@ Steve Jessop: Asking why means nothing; it is so because the implementers made it so. Now asking why is string 32 bytes for compiler X version y on platform z running OS a revision b does make sense. Why is water wet under STP on earth 1.0? Because in this environment it is in a liquid form. So I call it an analogy. Asking why is a string 32 is as meaningless as asking why water is wet. Both are only true under specific conditions. Without understanding the conditions it is impossible to answer.Thremmatology
Yup makes sense ... serves me right for treating StackOverflow like Twitter.Trailer
F
61

Most modern std::string implementations1 save very small strings directly on the stack in a statically sized char array instead of using dynamic heap storage. This is known as Small (or Short) String Optimisation (SSO). It allows implementations to avoid heap allocations for small string objects and improves locality of reference.

Furthermore, there will be a std::size_t member to save the strings size and a pointer to the actual char storage.

How this is specifically implemented differs but something along the following lines works:

template <typename T>
struct basic_string {
    char* begin_;
    size_t size_;
    union {
        size_t capacity_;
        char sso_buffer[16];
    };
};

On typical architectures where sizeof (void*) = 8, this gives us a total size of 32 bytes.


1 The “big three” (GCC’s libstdc++ since version 5, Clang’s libc++ and MSVC’s implementation) all do it. Others may too.

Frizzle answered 22/9, 2010 at 15:21 Comment(5)
@KonradRudolph very small strings are saved directly into the object, and this can be the stack or also the heap depending on where the string itself is allocated, no ?Chronologist
@ManuelSelva Exactly.Frizzle
@KonradRudolph How to force strings to be always heap allocated? (For the intention of having string objects smaller than 32 bytes, e.g. 8 bytes.)Labannah
@LukeFisk-Lennon You can’t. Small string optimisation is an implementation detail of certain (well, all modern) standard library implementations, it’s not specified by the language. As such you cannot change it within C++. You also can’t change it outside of C++ (e.g. via compiler options) because such a change would be ABI breaking. That said, GCC4 didn’t perform small string optimisation so in principle you could configure your GCC with --with-default-libstdcxx-abi=gcc4-compatible but that would be a terrible idea (= very old implementation).Frizzle
@KonradRudolph Ok I see. Thanks for the swift reply.Labannah
C
14

std::string typically contains a buffer for the "small string optimization" --- if the string is less than the buffer size then no heap allocation is required.

Congius answered 22/9, 2010 at 15:22 Comment(6)
Where "typically" == "on Windows" ;-)Melanie
Windows compilers aren't the only ones that do the small-string optimizationCongius
Sure, but if you're not willing to name them then it's hard to judge whether this is "typical" behaviour, or just called that on the grounds that it's the behaviour of a common implementation (and presumably others).Melanie
From what I understand, Dinkumware and STLPort both do, but gcc's implementation doesn't.Dinnerware
Btw, I mention it because "typically" spans a range from "I'm reasonably confident you'll never see anything else", to "50% or more of the implementations I've used do this". It's very easily misunderstood, I think. Neither this optimization, nor the absence of it, should be considered unusual.Melanie
Note that IBM-AIX C++ implementation contains a small string implementation with a 32 characters buffer (see here: www-01.ibm.com/support/docview.wss?uid=swg21453760)Wallachia
T
7

My guess is:

class vector
{
    char type;
    struct Heap
    {
      char*   start;
      char*   end;
      char*   allocatedEnd;
    };
    struct Stack
    {
      char    size;
      char    data[27];
    }
    union
    {
        Stack   stackVersion;
        Heap    heapVersion;
    } version;
};

But I bet there are hundreds of ways of doing it.

Thremmatology answered 22/9, 2010 at 16:5 Comment(2)
awww... no reference counts? what happened to folding?Ashcroft
@ErikAronesty There was a phase were reference counting was attempted with std::string but it became obvious that his was not very efficient (there were several papers on it over the years) and instead the short string optimization became popular.Thremmatology
S
7

In g++5.2 (in e.g. g++4.9, it is different) a string is basically defined as :

class string {
  char* bufferp;
  size_t length;
  union {
    char local_buffer[16];
    size_t capacity;
  };
};

On an ordinary computer this adds up to 32 bytes (8+8+16).

The actual definition is of course

typedef basic_string<char> string;

but the idea is the same.

Saxen answered 17/3, 2016 at 0:7 Comment(0)
T
3

It is library dependent. You shouldn't rely on the size of std::string objects because it is likely to change in different environments (obviously between different standard library vendors, but also between different versions of the same library).

Keep in mind that std::string implementations are written by people who have optimized for a variety of use cases, typically leading to 2 internal representations, one for short strings (small internal buffer) and one for long strings (heap-allocated external buffer). The overhead is associated to holding both of these inside each std::string object.

Tincal answered 22/9, 2010 at 15:51 Comment(0)
P
-2

Q: Why is a dog yellow? A: It's not necessarily.

The size of a (an?) std::string object is implementation-dependent. I just checked MS VC++ 2010. It does indeed use 32 bytes for std::string. There is a 16 byte union that contains either the text of the string, if it will fit, or a pointer to heap storage for longer strings. If the implementers had chosen to keep 18 byte strings in the string object rather than on the heap, the size would be 34 bytes. The other 16 bytes comprise overhead, containing such things as the length of the string and the amount of memory currently allocated for the string.

A different implementation might always allocate memory from the heap. Such an implementation would undoubtedly require less memory for the string object.

Playacting answered 22/9, 2010 at 16:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.