Does std::string move constructor actually move?
Asked Answered
I

1

21

So here i got a small test program:

#include <string>
#include <iostream>
#include <memory>
#include <vector>

class Test
{
public:
  Test(const std::vector<int>& a_, const std::string& b_)
    : a(std::move(a_)),
      b(std::move(b_)),
      vBufAddr(reinterpret_cast<long long>(a.data())),
      sBufAddr(reinterpret_cast<long long>(b.data()))
  {}

  Test(Test&& mv)
    : a(std::move(mv.a)),
      b(std::move(mv.b)),
      vBufAddr(reinterpret_cast<long long>(a.data())),
      sBufAddr(reinterpret_cast<long long>(b.data()))
  {}

  bool operator==(const Test& cmp)
  {
    if (vBufAddr != cmp.vBufAddr) {
      std::cout << "Vector buffers differ: " << std::endl
        << "Ours: " << std::hex << vBufAddr << std::endl
        << "Theirs: " << cmp.vBufAddr << std::endl;
      return false;
    }
    
    if (sBufAddr != cmp.sBufAddr) {
      std::cout << "String buffers differ: " << std::endl
        << "Ours: " << std::hex << sBufAddr << std::endl
        << "Theirs: " << cmp.sBufAddr << std::endl;
      return false;
    }
  }

private:
  
  std::vector<int> a;
  std::string b;
  long long vBufAddr;
  long long sBufAddr;
};

int main()
{
  Test obj1 { {0x01, 0x02, 0x03, 0x04}, {0x01, 0x02, 0x03, 0x04}};
  Test obj2(std::move(obj1));

  obj1 == obj2;
  
                       
  return 0;
}

Software i used for test:

Compiler: gcc 7.3.0

Compiler flags: -std=c++11

OS: Linux Mint 19 (tara) with upstream release Ubuntu 18.04 LTS (bionic)

The results i see here, that after move, vector buffer still has the same address, but string buffer doesn't. So it looks to me, that it allocated fresh one, instead of just swapping buffer pointers. What causes such behavior?

Insouciance answered 29/1, 2019 at 11:53 Comment(0)
A
42

You're likely seeing the effects of the small/short string optimization (SSO). To avoid unnecessary allocations for every tiny little string, many implementations of std::string include a small fixed size array to hold small strings without requiring new (this array usually repurposes some of the other members that aren't necessary when dynamic allocation has not been used, so it consumes little or no additional memory to provide it, either for small or large strings), and those strings don't benefit from std::move (but they're small, so it's fine). Larger strings will require dynamic allocation, and will transfer the pointer as you expect.

Just for demonstration, this code on g++:

void move_test(std::string&& s) {
    std::string s2 = std::move(s);
    std::cout << "; After move: " << std::hex << reinterpret_cast<uintptr_t>(s2.data()) << std::endl;
}

int main()
{
    std::string sbase;

    for (size_t len=0; len < 32; ++len) {
        std::string s1 = sbase;
        std::cout << "Length " << len << " - Before move: " << std::hex << reinterpret_cast<uintptr_t>(s1.data());
        move_test(std::move(s1));
        sbase += 'a';
    }
}

Try it online!

produces high (stack) addresses that change on move construction for lengths of 15 or less (presumably varies with architecture pointer size), but switches to low (heap) addresses that remain unchanged after move construction once you hit length 16 or higher (the switch is at 16, not 17, because it is NUL-terminating the strings, since C++11 and higher require it).

To be 100% clear: This is an implementation detail. No part of the C++ spec requires this behavior, so you should not rely on it occurring at all, and when it occurs, you should not rely on it occurring for specific string lengths.

Abstract answered 29/1, 2019 at 11:57 Comment(8)
"include a small fixed size array to hold small strings" - You don't usually include an array for SSO but you reuse the available storage (size / pointer / ...) and add a flag to indicate if you have a short string.Haworth
@Holt: Sure, but the effect is the same for the OP's purposes. For that matter, a dedicated flag isn't necessary if you make the cutoff a strict length/capacity limit (which, if you're just shoving data into pointers members, is going to be limited anyway).Abstract
@Holt: Which is to say, you use a discriminated union that contains a char[N] for short strings. So the array is there in the type, even though it might not be present in all objects.Litigious
@Litigious I know that. My point was more on how this answer is phrased since, at least for me, it reads as if SSO needs extra memory within a std::string, which it does not (save from maybe a bit-flag, if any).Haworth
@Holt: I did qualify the statement, just to be clear.Abstract
Doesn't look like pointer size defines the limit for SSO: adding -m32 to compiler flags in your "try it online" link still results in 15 being the limit.Andesine
@Ruslan: Looks like it. I just checked GCC 8's header, and it just defines a enum constant _S_local_capacity = 15 / sizeof(_CharT), then defines union { _CharT _M_local_buf[_S_local_capacity + 1]; size_type _M_allocated_capacity; }; So it's actually reserving a fixed 16 bytes unioned with the capacity as a size_type, which means the SSO array is 8-12 bytes larger than the members it's sharing. The 32 bit string is smaller (sizeof reports 24 bytes, vs. 32 for the 64 bit string), but I'm guessing it could have been 12 for 32 bit and 24 for 64 bit without the SSO.Abstract
@Abstract … at which point it becomes mandatory to link to The strange details of std::string at Facebook … (nice research BTW)Prairial

© 2022 - 2024 — McMap. All rights reserved.