Unnecessary emptying of moved-from std::string
Asked Answered
L

2

10

Both libstdc++ and libc++ makes moved-from std::string object empty, even if the original stored string is short and short string optimization is applied. It seems to me that this emptying makes an additional and unnecessary runtime overhead. For instance, here is the move constructor of std::basic_string from libstdc++:

basic_string(basic_string&& __str) noexcept
  : _M_dataplus(_M_local_data(), std::move(__str._M_get_allocator())) {
    if (__str._M_is_local()) 
      traits_type::copy(_M_local_buf, __str._M_local_buf, _S_local_capacity + 1);
    else {
      _M_data(__str._M_data());
      _M_capacity(__str._M_allocated_capacity);
    }
    _M_length(__str.length());
    __str._M_data(__str._M_local_data());  // (1)
    __str._M_set_length(0);                // (2)
  }

(1) is an assignment that is useless in case of a short string, since data is already set to local data, so we just assign a pointer the same value it has been assigned before.

(2) Emptying string sets the string size and resets the first character in the local buffer, which, as far as I know, the Standard does not demand.

Usually, library implementers tries to implement the Standard as much efficient as it is possible (for instance, deleted memory regions are not zeroed-out). My question is if there might be any particular reasons why moved-from strings are emptied even if it is not required and it adds an unnecessary overhead. Which, can be easily eliminated, e.g., by:

basic_string(basic_string&& __str) noexcept
  : _M_dataplus(_M_local_data(), std::move(__str._M_get_allocator())) {
    if (__str._M_is_local()) {
      traits_type::copy(_M_local_buf, __str._M_local_buf, _S_local_capacity + 1);
      _M_length(__str.length());
    }
    else {
      _M_data(__str._M_data());
      _M_capacity(__str._M_allocated_capacity);
      _M_length(__str.length());
      __str._M_data(__str._M_local_data());  // (1)
      __str._M_set_length(0);                // (2)
    }
  }
Lazurite answered 8/10, 2018 at 6:18 Comment(4)
It is perhaps a tribute to legacy code. Before SSO entered libstdc++, std::string was reference counted and all moved-from strings were empty. Old code might (incorrectly) depend on it.Misdeal
BTW, in your suggested replacement, you must not set the source length to zero before you use it to set the destination length.Misdeal
@Misdeal Updated, thanks for pointing that out.Lazurite
Same with operator=.Lukelukens
R
10

In the case of libc++, the string move constructor does empty the source, but it is not unnecessary. Indeed, the author of this string implementation was the same person that led the move semantics proposal for C++11. ;-)

This implementation of the libc++ string was actually designed from the move members outwards!

Here is the code with some unnecessary details (like debug mode) code left out:

template <class _CharT, class _Traits, class _Allocator>
basic_string<_CharT, _Traits, _Allocator>::basic_string(basic_string&& __str)
        _NOEXCEPT
    : __r_(_VSTD::move(__str.__r_))
{
    __str.__zero();
}

In a nutshell, this code copies all of the bytes of the source, and then zeros all of the bytes of the source. One thing to immediately note: There is no branching: this code does the same thing for long and short strings.

Long string mode

In "long mode", the layout is 3 words, a data pointer and two integral types to store size and capacity, minus 1 bit for the long/short flag. Plus an space for an allocator (optimized away for empty allocators).

So this copies the pointer/sizes, and then nulls out the source to release ownership of the pointer. This also sets the source to "short mode" as the short/long bit means short in the zero state. Also all zero bits in the short mode represent a zero-size, non-zero capacity short string.

Short string mode

When the source is a short string, the code is identical: The bytes are copied over, and the source bytes are zeroed out. In short mode there are no self-referencing pointers, and so copying bytes is the correct algorithm.

Now it is true that in "short mode", the zeroing of the 3 words of the source might seem unnecessary, but to do that one would have to check the long/short bit and zero bytes when in long mode. Doing this check-and-branch would actually be more expensive than just zeroing the 3 words because of the occasional branch mis-prediction (breaking the pipeline).

Here is the optimized x86 (64bit) assembly for the libc++ string move constructor.

std::string
test(std::string& s)
{
    return std::move(s);
}

__Z4testRNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE: ## @_Z4testRNSt3__112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE
    .cfi_startproc
## %bb.0:
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register %rbp
    movq    16(%rsi), %rax
    movq    %rax, 16(%rdi)
    movq    (%rsi), %rax
    movq    8(%rsi), %rcx
    movq    %rcx, 8(%rdi)
    movq    %rax, (%rdi)
    movq    $0, 16(%rsi)
    movq    $0, 8(%rsi)
    movq    $0, (%rsi)
    movq    %rdi, %rax
    popq    %rbp
    retq
    .cfi_endproc

(no branches!)

<aside>

The size of the internal buffer for the short string is also optimized for the move members. The internal buffer is "union'ed" with the 3 words required for "long mode", so that the sizeof(string) requires no more space than when in long mode. Despite this compact sizeof (the smallest among the 3 major implementations), libc++ enjoys the largest internal buffer on 64 bit architectures: 22 char.

The small sizeof translates into faster move members since all these members do is copy and zero bytes of the object layout.

See this Stackoverflow answer for more details on the internal buffer size.

</aside>

Summary

So in summary, the setting of the source to an empty string is necessary in "long mode" to transfer ownership of the pointer, and also necessary in short mode for performance reasons to avoid a broken pipeline.

I have no comment on the libstdc++ implementation as I did not author that code and your question already does a good job of that anyway. :-)

Ruysdael answered 8/10, 2018 at 11:55 Comment(1)
Thanks for the detailed answer about libc++. I am familiar with both implementations and libstdc++ uses an additional buffer ("unioned" with capacity), which has the advantage that there is no branching when stored string is accessed (in libc++ one has to check the bit flag). It makes perfect sense what you wrote, as well as @Misdeal comment.Lazurite
E
3

I know I thought about whether to zero the moved-from string when I was implementing the libstdc++ version, but I don't remember my reasons for deciding to zero it out. I think I probably decided that leaving the moved-from string empty would be following the principle of least astonishment. The most "obvious" state for a moved-from string is to be empty, even if sometimes being non-empty would perform slightly better.

As suggested in the comments, it avoids breaking any code that (maybe unintentionally) relied on the string being empty. I don't think that was one of my considerations though. C++11 code that relies on the COW string semantics will be broken by more than just moved-from strings being non-empty.

Worth noting is that at -O2 the current libstdc++ code compiles to fewer instructions compared to your suggested alternative. However something like this compiles even smaller, and is probably faster (I didn't measure it though, or even test it works):

  basic_string(basic_string&& __str) noexcept
  : _M_dataplus(_M_local_data(), std::move(__str._M_get_allocator()))
  {
    memcpy(_M_local_buf, __str._M_local_buf, sizeof(_M_local_buf));
    _M_length(__str.length());
    if (!__str._M_is_local())
      {
        _M_data(__str._M_data());
        __str._M_data(__str._M_local_data());
        __str._M_set_length(0);
      }
  }
Ellett answered 18/10, 2018 at 19:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.