Why Doesn't string::data() Provide a Mutable char*?
Asked Answered
R

2

25

In array, string, and vector all got the data method which:

Returns pointer to the underlying array serving as element storage. The pointer is such that range [data(); data() + size()) is always a valid range, even if the container is empty. [Source]

This method is provided in a mutable and const version for all applicable containers, for example:

T* vector<T>::data();
const T* vector<T>::data() const;

All applicable containers, that is, except string which only provides the const version:

const char* string::data() const;

What happened here? Why did string get shortchanged, when char* string::data() would be so helpful?

Radio answered 8/12, 2015 at 12:1 Comment(15)
It has been discussed and It's recorded as an issue, details here.Neoterize
It does: &s[0], for non-empty strings.Idolist
I think that if you need to modify the string via data, you probably are looking for vector<char>, instead. There're some exceptions, though.Bettor
@black I have used vector<char> extensively and regretted it every time, that I subsequently copy the characters back into a string. What I actually want is a string with a modifiable buffer.Radio
@Neoterize This is the best answer by far. Would you be willing to write this up?Radio
Possible duplicate of Why are std::vector::data and std::string::data different?Ishmaelite
@JonathanMee std::string does have a modifiable buffer. The issue mentioned by Alper is just that it has inconsistent naming compared to other standard library containers.Idolist
@Idolist While it does have a modifiable buffer (circa C++11), it does not have an accessor for the buffer, so we're left creating our own: &string::front() or &*string::begin() and so on.Radio
Huh? What's your definion of "accessor" if not "expression that accesses the buffer" ?Idolist
@Idolist Maybe I should say "direct accessor" or something like that. That's the reason I want char* string::data() so I don't have to make a call on the string and then take the address of the return to get the buffer. I want a single call on the string to return it's modifiable buffer.Radio
@Neoterize Welp, I added an answer that is essentially exactly what you said :( Hope you at least got the Disciplined badge out of this.Radio
You can simply (char*)mystring.data() .... not sure if this is good practice however =PChelyuskin
@Chelyuskin You're using a C-style cast there, which in C++ simplifies to (and would preferably be written as) a const_cast, wherein: "Modifying a const object through a non-const access path and referring to a volatile object through a non-volatile glvalue results in undefined behavior." As such I believe we could say, "No, this is definitely not good practice."Radio
@JonathanMee, I totally agree with you. What's scary is that I have seen this work in a code base that "undefined behavior" is very defined :(Chelyuskin
@Chelyuskin One of the things that makes undefined behaviors so dangerous is they sometimes appear well behaved. int* foo() { int result[] = { 1, 2, 3 }; return result; } you may see this undefined behavior work for years until someone adjusts the order of events and foo's stack frame is overwritten before its return is used. then suddenly you have undefined behavior that was "caused" by code totally unrelated to the problem. As someone who understands such things you have a responsibility to speak against them, or at a minimum not to propagate them by suggesting them in comments.Radio
R
19

The short answer is that does provide the char* string::data() method. Which is vital for the similarly data function, thus to gain mutable access to the underlying C-String I can now do this:

auto foo = "lorem ipsum"s;

for(auto i = data(foo); *i != '\0'; ++i) ++(*i);

For historical purposes it's worth chronicling string's development which is building upon: In access to string's underlying buffer is made possible possible by a new requirement that it's elements are stored contiguously such that for any given string s:

&*(s.begin() + n) == &*s.begin() + n for any n in [0, s.size()), or, equivalently, a pointer to s[0] can be passed to functions that expect a pointer to the first element of a CharT[] array.

Mutable access to this newly required underlying C-String was obtainable by various methods, for example: &s.front(), &s[0], or &*s.first() But back to the original question which would avoid the burden of using one of these options: Why hasn't access to string's underlying buffer been provided in the form of char* string::data()?

To answer that it is important to note that T* array<T>::data() and T* vector<T>::data() were an addition required by . No additional requirements were incurred by against other contiguous containers such as deque. And there certainly wasn't an additional requirement for string, in fact the requirement that string was contiguous was new to . Before this const char* string::data() had existed. Though it explicitly was not guaranteed to be pointing to any underlying buffer, it was the only way to obtain a const char* from a string:

The returned array is not required to be null-terminated.

This means that string was not "shortchanged" in 's transition to data accessors, it simply was not included thus only the const data accesor that string previously possessed persisted. There are naturally occurring examples in C++11's implementation which necessitate writing directly to the underlying buffer of a string.

Radio answered 9/12, 2015 at 12:55 Comment(0)
I
18

I think this restriction comes from the (pre-2011) days where std::basic_string didn't have to store its internal buffer as a contiguous byte array.

While all the others (std::vector and such) had to store their elements as a contiguous sequence per the 2003 standard; so data could easily return mutable T*, because there was no problem with iterations, etc.

If std::basic_string were to return a mutable char*, that would imply that you can treat that char* as a valid C-string and perform C-string operations like strcpy, that would easily turn to undefined behavior were the string not allocated contiguously.

The C++11 standard added the rule that basic_string has to be implemented as a contiguous byte array. Needless to say, you can work-around this by using the old trick of &str[0].

Injudicious answered 8/12, 2015 at 12:9 Comment(14)
I would go further and say that to provide a mutable data() method would defeat the encapsulation of the string object by encouraging developers to use it as a byte buffer. It would therefore be a retrograde step.Quiteris
You say it comes from the pre-C++11 restriction, but since string::data was introduced as part of the C++11, where and the contiguous requirement was already in place, why not also provide char* string::data() as a part of C++11?Radio
according to here : cplusplus.com/reference/string/string/data std::string::data already had being exist since C++98. it also speaks of the same issue I was presenting here.Injudicious
@RichardHodges I don't believe that the standard committee agrees with you sir, as they have provided data() on array, dynarray, and vector without fear of "retrograde", so I don't think that was a motivation.Radio
@DavidHaim Interesting, so it existed, but I was just unaware of it, and now that data exists in a mutable format elsewhere, I'm no longer happy with string providing the same thing it always has.Radio
why do you want data anyway? perhapse there are better ways of doing the thing you wantInjudicious
@JonathanMee I'm sure there is much that the committee would disagree with me on. In many cases I'm sure that's a good thing :-) I suppose at least the method data() explicitly expresses intent, whereas &s[0] is cryptic. So ok, I'm persuaded.Quiteris
@DavidHaim I think the only time you'd want to legitimately use mutable data() is when using the string to provide a buffer to receive data from an external 'c'-style library AND you subsequently wanted to use that data as a string.Quiteris
@RichardHodges I'm sad to report that there are even times when sprintf is required. And I'd much rather be calling fullfloat.data() here than &*fullfloat.begin(). So I think even your "external 'c'-style" stipulation may be too broad. I might just need this functionality internally.Radio
@JonathanMee if there's ostream functionality missing because of the lack of an appropriate manipulator, isn't it better to fix the standard so that it provides one? It seems to me that emitting formatted data into a stream can be achieved in a more encapsulated way than resorting to the landmine that is sprintf in mainline code. As an implementation detail of a custom utility function it's more forgivable, IMHO.Quiteris
@RichardHodges I agree with that sentiment. Fix the real problem with ostream don't just obscure it. But it doesn't stop me from wanting char* string::data() too :DRadio
@JonathanMee on balance, I agree with you. The provision of a mutable data() would be less cryptic than hacking it with &s[0] or addressof(*begin(s)). I think we just made history. I just changed my view in public on the internet :-))Quiteris
@RichardHodges I've given you a random +1 that was just too funny. lolRadio
C++17 has added a non-const data() member to string.Willumsen

© 2022 - 2024 — McMap. All rights reserved.