Why is initializing a string to "" more efficient than the default constructor?
Asked Answered
I

1

51

Generally, the default constructor should be the fastest way of making an empty container. That's why I was surprised to see that it's worse than initializing to an empty string literal:

#include <string>

std::string make_default() {
    return {};
}

std::string make_empty() {
    return "";
}

This compiles to: (clang 16, libc++)

make_default():
        mov     rax, rdi
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi], xmm0
        mov     qword ptr [rdi + 16], 0
        ret
make_empty():
        mov     rax, rdi
        mov     word ptr [rdi], 0
        ret

See live example at Compiler Explorer.

Notice how returning {} is zeroing 24 bytes in total, but returning "" is only zeroing 2 bytes. How come return ""; is so much better?

Impearl answered 25/6, 2023 at 23:40 Comment(3)
I question what exactly does "efficient" actually means. People think a computer having lots of free RAM is "good", but that's stupid. Free RAM is unused RAM, and sits there not doing anything. The best state of RAM is that it is being used, but is readily available for more demanding applications.Unkennel
@Unkennel the amount of memory we use is the same in both cases. A std::string always occupies 24 bytes in libc++ for our architecture. The difference between the functions is that make_empty() leaves more of this memory indeterminate, whereas make_default() zeroes it all.Impearl
I don't know enough to write an answer, but this difference between {} and "" looks extremely similar to C. In C if you write char s[100] = "Hello";, you get a 100-char array with 'h', 'e', 'l', 'l', 'o', '\0' as the first six elements, and the remaining 94 elements not initialised. But if you write char s[100] = {'h', 'e', 'l', 'l', 'o', '\0'};, the remaining 94 elements are initialised to 0.Brisson
I
55

This is an intentional decision in libc++'s implementation of std::string.

First of all, std::string has so-called Small String Optimization (SSO), which means that for very short (or empty) strings, it will store their contents directly inside of the container, rather than allocating dynamic memory. That's why we don't see any allocations in either case.

In libc++, the "short representation" of a std::string consists of:

Size (x86_64) Meaning
1 bit "short flag" indicating that it is a short string (zero means yes)
7 bits length of the string, excluding null terminator
0 bytes padding bytes to align string data (none for basic_string<char>)
23 bytes string data, including null terminator

For an empty string, we only need to store two bytes of information:

  • one zero-byte for the "short flag" and the length
  • one zero-byte for the null terminator

The constructor accepting a const char* will only write these two bytes, the bare minimum. The default constructor "unnecessarily" zeroes all 24 bytes that the std::string contains. This may be better overall though, because it makes it possible for the compiler to emit std::memset or other SIMD-parallel ways of zeroing arrays of strings in bulk.

For a full explanation, see below:

Initializing to "" / Calling string(const char*)

To understand what happens, let's look at the libc++ source code for std::basic_string:

// constraints...
/* specifiers... */ basic_string(const _CharT* __s)
  : /* leave memory indeterminate */ {
    // assert that __s != nullptr
    __init(__s, traits_type::length(__s));
    // ...
  }

This ends up calling __init(__s, 0), where 0 is the length of the string, obtained from std::char_traits<char>:

// template head etc...
void basic_string</* ... */>::__init(const value_type* __s, size_type __sz)
{
    // length and constexpr checks
    pointer __p;
    if (__fits_in_sso(__sz))
    {
        __set_short_size(__sz); // set size to zero, first byte
        __p = __get_short_pointer();
    }
    else
    {
        // not entered
    }
    traits_type::copy(std::__to_address(__p), __s, __sz); // copy string, nothing happens
    traits_type::assign(__p[__sz], value_type()); // add null terminator
}

__set_short_size will end up writing only a single byte, because the short representation of a string is:

struct __short
{
    struct _LIBCPP_PACKED {
        unsigned char __is_long_ : 1; // set to zero when active
        unsigned char __size_ : 7;    // set to zero for empty string
    };
    char __padding_[sizeof(value_type) - 1]; // zero size array
    value_type __data_[__min_cap]; // null terminator goes here
};

After compiler optimizations, zeroing __is_long_, __size_, and one byte of __data_ compiles to:

mov word ptr [rdi], 0

Initializing to {} / Calling string()

The default constructor is more wasteful by comparison:

/* specifiers... */ basic_string() /* noexcept(...) */
  : /* leave memory indeterminate */ {
    // ...
    __default_init();
}

This ends up calling __default_init(), which does:

/* specifiers... */ void __default_init() {
    __r_.first() = __rep(); // set representation to value-initialized __rep
    // constexpr-only stuff...
}

Value-initialization of a __rep() results in 24 zero bytes, because:

struct __rep {
    union {
        __long  __l; // first union member gets initialized,
        __short __s; // __long representation is 24 bytes large
        __raw   __r;
    };
};

Conclusion

If you want to value-initialize everywhere for the sake of consistency, don't let this keep you from it. Zeroing out a few bytes unnecessarily isn't a big performance problem you need to worry about.

In fact, it is helpful when initializing large quantities of strings, because std::memset may be used, or some other SIMD way of zeroing out memory.

Impearl answered 25/6, 2023 at 23:40 Comment(3)
To your point, I went ahead and tested this in Godbolt. Using std::string the compiler will go ahead and use memset after a certain number of elements (11 in my testing), where as "" will use mov ptr addr, 0 as seen hereInsinuation
@Insinuation I've been getting similar results. I'm not sure if clang ever chooses to turn it into a loop for the "" version. Even at 1024 strings, it will just emit 1024 mov instructions: godbolt.org/z/5x6E7rz1s. This is probably more relevant for std::vector, which would manually begin lifetimes in a loop when resizing. Demo here: godbolt.org/z/Y7xW4j7E7Impearl
Another experiment godbolt.org/z/rjzTvvrPa which appears to demonstrates your point about how the default constructor may be more efficient when initializing a std::vector<std::string> for instance.Buxton

© 2022 - 2024 — McMap. All rights reserved.