This is an intentional decision in libc++'s implementation of std::string
.
First of all, std::string
has so-called Small String Optimization (SSO), which means that for very short (or empty) strings, it will store their contents directly inside of the container, rather than allocating dynamic memory.
That's why we don't see any allocations in either case.
In libc++, the "short representation" of a std::string
consists of:
Size (x86_64) |
Meaning |
1 bit |
"short flag" indicating that it is a short string (zero means yes) |
7 bits |
length of the string, excluding null terminator |
0 bytes |
padding bytes to align string data (none for basic_string<char> ) |
23 bytes |
string data, including null terminator |
For an empty string, we only need to store two bytes of information:
- one zero-byte for the "short flag" and the length
- one zero-byte for the null terminator
The constructor accepting a const char*
will only write these two bytes, the bare minimum.
The default constructor "unnecessarily" zeroes all 24 bytes that the std::string
contains.
This may be better overall though, because it makes it possible for the compiler to emit std::memset
or other SIMD-parallel ways of zeroing arrays of strings in bulk.
For a full explanation, see below:
Initializing to ""
/ Calling string(const char*)
To understand what happens, let's look at the libc++ source code for std::basic_string
:
// constraints...
/* specifiers... */ basic_string(const _CharT* __s)
: /* leave memory indeterminate */ {
// assert that __s != nullptr
__init(__s, traits_type::length(__s));
// ...
}
This ends up calling __init(__s, 0)
, where 0
is the length of the string, obtained from std::char_traits<char>
:
// template head etc...
void basic_string</* ... */>::__init(const value_type* __s, size_type __sz)
{
// length and constexpr checks
pointer __p;
if (__fits_in_sso(__sz))
{
__set_short_size(__sz); // set size to zero, first byte
__p = __get_short_pointer();
}
else
{
// not entered
}
traits_type::copy(std::__to_address(__p), __s, __sz); // copy string, nothing happens
traits_type::assign(__p[__sz], value_type()); // add null terminator
}
__set_short_size
will end up writing only a single byte, because the short representation of a string is:
struct __short
{
struct _LIBCPP_PACKED {
unsigned char __is_long_ : 1; // set to zero when active
unsigned char __size_ : 7; // set to zero for empty string
};
char __padding_[sizeof(value_type) - 1]; // zero size array
value_type __data_[__min_cap]; // null terminator goes here
};
After compiler optimizations, zeroing __is_long_
, __size_
, and one byte of __data_
compiles to:
mov word ptr [rdi], 0
Initializing to {}
/ Calling string()
The default constructor is more wasteful by comparison:
/* specifiers... */ basic_string() /* noexcept(...) */
: /* leave memory indeterminate */ {
// ...
__default_init();
}
This ends up calling __default_init()
, which does:
/* specifiers... */ void __default_init() {
__r_.first() = __rep(); // set representation to value-initialized __rep
// constexpr-only stuff...
}
Value-initialization of a __rep()
results in 24 zero bytes, because:
struct __rep {
union {
__long __l; // first union member gets initialized,
__short __s; // __long representation is 24 bytes large
__raw __r;
};
};
Conclusion
If you want to value-initialize everywhere for the sake of consistency, don't let this keep you from it. Zeroing out a few bytes unnecessarily isn't a big performance problem you need to worry about.
In fact, it is helpful when initializing large quantities of strings, because std::memset
may be used, or some other SIMD way of zeroing out memory.
std::string
always occupies 24 bytes in libc++ for our architecture. The difference between the functions is thatmake_empty()
leaves more of this memory indeterminate, whereasmake_default()
zeroes it all. – Impearl{}
and""
looks extremely similar to C. In C if you writechar s[100] = "Hello";
, you get a 100-char array with'h', 'e', 'l', 'l', 'o', '\0'
as the first six elements, and the remaining 94 elements not initialised. But if you writechar s[100] = {'h', 'e', 'l', 'l', 'o', '\0'};
, the remaining 94 elements are initialised to 0. – Brisson