Why do data() and c_str() return char const*, while operator[] returns char&?
Asked Answered
C

2

7

Why do std::string::data and std::string::c_str() return pointers to const chars, while std::string::operator[] returns references to mutable chars?

std::string string("eightfold is the greatest");

auto s = string.data();
*s = 'r'; // illegal

auto t = &string[0];
*t = 'r'; // totally fine

auto& c = string[0];
c = 'r'; // totally fine

Why don’t std::string::data() and std::string::c_str() return char*, or why doesn’t std::string::operator[] return char const&?

What is the rationale behind this?

Clausius answered 3/11, 2013 at 19:42 Comment(3)
Note that &s[0] returns char* which cannot be treated as null-terminated string, while char const* returned by c_str() is null-terminated string.Sustain
@Nawaz, I'm almost sure the null is guaranteed to be there in C++11 (along with it being contiguous), even if you can't modify it.Pencel
@Nawaz: In C++11 &s[0] is guaranteed to be a C-string. In pre C++11 versions of the language it was not guaranteed.Tacye
T
14

operator [] gives you direct access to the controlled sequence of std::string object. c_str() originally did not.

In the original specification of std::string the stored sequence was not required to be a zero-terminated string. This meant that in general case c_str() could not return a direct pointer to the stored sequence. It had to return a pointer to a completely independent, separately allocated temporary copy of the controlled sequence (with an added zero terminator character). For this reason, trying to modify the C-string returned by c_str() made no sense at all. Any modifications applied to that separate C-string would not be propagated to the actual controlled sequence. (In fact, the specification explicitly prohibited any modification attempts. For example, for an empty std::string an implementation could simply return a pointer to a string literal "", which was of course non-modifiable and could be easily shared between all std::string objects.) So, it made perfect sense to make c_str() to return const char *.

C++11 changed the internal specification of c_str() making it to return a direct pointer to the actual controlled sequence. But the external spec of c_str() remained unchanged to keep it aligned with the legacy spec.

Tacye answered 3/11, 2013 at 19:58 Comment(2)
No, it didn't have to be "a completely independent, separately allocated temporary copy of the controlled sequence". Probably a NUL would simply be appended (but not counted in size(), you would have (strictly) size() < capacity() instead of the usual size() <= capacity()Roomful
@Ben Voigt: My "It had..." sentence was intended to continue the previous one: "In general case...". Yes, it is a well-known fact that real-life implementations preferred to store the zero-terminator explicitly and implement lightweight c_str(). However, in general case an "abstract" implementation would allocate an independent buffer. The entire original spec of c_str() is deliberately written to support this approach. I.e. regardless of what c_str() actually returned, an abstract program had to assume that the buffer was independently allocated.Tacye
W
3

For historical reasons, C++ and its standard library support C-strings (character arrays), and lots of C++ code uses C-strings for input and output.

You can also imagine a possible implementation of the std::string that keeps its data in a character array. This would normally be a completely private implementation detail, that is not exposed through the class' public interface.

EDIT: to be explicit, a class would normally not expose non-const views of its private data. To see why this would be an issue, imagine the following code:

std::string s("abc");  
char* ps = s.c_str();  //  ps[0] == 'a' and ps[3] == '\0'
ps[3] = 'd';  // string is not null terminated
printf("%s", s.c_str());  // printing non-terminated string.

Such a change would allow a user of the class to change its private data in a way that breaks invariants, namely the following invariant: "The character buffer used for storage will be null-terminated."

Part of the contract of operator[] is that the caller must not provide an argument greater or equal to the length of the string. The at(size_t pos) member function enforces bounds checking by throwing an exception. The std::string::operator[] can still be used unsafely, but it's possible to at least document a contract, unlike with a pointer dereference operator as in ps[3].

END OF EDIT

But in order to support interoperability with functions that expect a const char* C-string, std::string exposes this character buffer.

Of course, like with std::vector, users might want to modify individual elements (characters) in a string, which is why the string provides operator[].

(In reality, string implementations often have a character buffer of fixed length that they keep internally, and then "reallocate" on the heap if the string's contents exceed that fixed length. This is called the "small string optimization.")

Why is there a data() member function, you may ask, when there is a perfectly serviceable c_str() member function? I think this is there to simplify generic programming: std::array and std::vector also have data() member functions, and std::strings are designed to act like containers.

Wanting answered 3/11, 2013 at 19:49 Comment(2)
“But in order to support interoperability with functions that expect a const char* C-string, std::string exposes this character buffer.” Isn’t char* implicitly convertible to char const*?Clausius
This answer completely misses the question. It's not about why there is a c_str() function, but why it returns a char const* and not char*.Dagmardagna

© 2022 - 2024 — McMap. All rights reserved.