Unexplained discrepancies when copying and modifying std::strings
Asked Answered
H

1

6

In the following code, "situation 1" works as expected on all compilers tested, however "situation 2" it seems to behave differently based on the compiler used.

As an example MSVC has sit1 and sit2 produce the same results, however when using gcc/clang and libstdc++, the modification occurs to the original string and it's copy (sort of like a COW string) even though I'm building using the C++11 switch.

#include <iostream>
#include <string>

int main() {

   // situation 1
   {
      std::string x0 = "12345678";
      std::string x1 = x0;

      char* ptr = &x0[0] + 3;

      (*ptr) = ' ';

      std::cout << "1. x0: " << x0 << "\n";
      std::cout << "1. x1: " << x1 << "\n";

      if ((&x0[0]) == x0.data()) std::cout << "1. ptrs are equal\n";

   }

   // situation 2
   {
      std::string x0 = "12345678";
      std::string x1 = x0;

      char* ptr = const_cast<char*>(x0.data() + 3);

      (*ptr) = ' ';

      std::cout << "2. x0: " << x0 << "\n";
      std::cout << "2. x1: " << x1 << "\n";

      if ((&x0[0]) == x0.data()) std::cout << "2. ptrs are equal\n";
   }

   return 0;
}

GCC (6.1)

1. x0: 123 5678
1. x1: 12345678
1. ptrs are equal
2. x0: 123 5678
2. x1: 123 5678
2. ptrs are equal

MSVC (2015)

1. x0: 123 5678
1. x1: 12345678
1. ptrs are equal
2. x0: 123 5678
2. x1: 12345678
2. ptrs are equal

Is there any reason for the discrepancies in behavior between the various compilers - given that &x0[0] and .data() return the same address?

Husted answered 16/8, 2016 at 22:16 Comment(9)
This GCC 6.1 behaves correctly.Mcpeak
Did you really have to replace 4 with Z? Could you not find something that's even harder to distinguish visually, like l and I? Half the fun of asking questions on SO is making it as hard as possible for anyone to spot the important details.Mcpeak
I really hope you're asking this because you're porting ancient, horrible code you didn't write yourself.Beast
@KerrekSB I didn't realize that, it was just a simple test case.Husted
@SebastianRedl I found it while investigating a bug, the bug occurred only on linux builds but worked fine with windows builds.Husted
@KerrekSB: the following gives a different result: cpp.sh/32ip is this really a compiler issue or a difference in the standard libraries?Husted
@JacobiJohn: It's the standard library. The thing that implements std::string :-)Mcpeak
If you cast the pointers to uintptr_t and print them out, you will see that x1.data() and x1.data() are different. ptr is pointing to somewhere in x0 in situation 1 and x1 in situation 2. Definitely some interaction between UB and CoW.Metachromatism
In C++1z (hopefully z==7), you'll be able to do this without the const_cast (just like with vector).Lytle
H
16

Situation 2 causes undefined behaviour:

char* ptr = const_cast<char*>(x0.data() + 3);

(*ptr) = 'Z';

According to the specification of std::basic_string::data (C++14 [string.accessors]/3):

Requires: The program shall not alter any of the values stored in the character array.

In other words, you are not allowed to cast away the const and modify the string via the pointers returned by data() or c_str() .

Hazelwood answered 16/8, 2016 at 22:21 Comment(11)
but the pointer addresses are the same? and doesn't the standard say that x0.data() and &x0[0] should be the same?Husted
@JacobiJohn they should be the same address, however you are allowed to modify only via the &x[0] version. For a copy-on-write implementation , the operator[] will do a copy in case the result is modified, but .data() won't.Hazelwood
@Hazelwood Modifiable, not modified. There's no way for the code to know whether it will actually be modified. In other words, x[0] on a non-const string instance/reference already causes the copy, even if nothing further is done with the result.Beast
That's interesting, how will the std::string class know I've modified it's contents when I access it via a pointer based on &x[0]? As far as I can tell the type of &x[0] is char* which doesnt have any smarts in it like a shared_ptr.Husted
@SebastianRedl that's what I saidHazelwood
@JacobiJohn the copy happens when you do [0], not when you actually writeHazelwood
@Hazelwood So what you're saying is that everytime the non-const subscript operator for std::string is accessed, an if-statement is executed to see if the backing data is aliased and if so it performs a malloc/copy? then returns the value?Husted
@JacobiJohn: Yes, that's the only way to implement a COW string afaik. Whenever a write could occur, you need to actually make the copy if you didn't already.Bremerhaven
@JacobiJohn in a copy-on-write string, yes that is how it works.Hazelwood
I believe when OP is comparing the two pointers, the const version of operator[] is used.Depicture
@LukeLee x0 is not const, so the non-const version is usedHazelwood

© 2022 - 2024 — McMap. All rights reserved.