Does "&s[0]" point to contiguous characters in a std::string?
Asked Answered
N

6

40

I'm doing some maintenance work and ran across something like the following:

std::string s;
s.resize( strLength );  
// strLength is a size_t with the length of a C string in it. 

memcpy( &s[0], str, strLength );

I know using &s[0] would be safe if it was a std::vector, but is this a safe use of std::string?

Narva answered 31/12, 2009 at 20:22 Comment(3)
The use of &s[0] is OK, memcpy() arguably less so. Why not simply do an assignment, or use the string's assign() member function?Lisalisabet
@Neil Butterworth, that is what I'm asking myself while looking at this code... ;)Narva
As you gain experience programming in C++, you will refrain more and more from using memset and memcpy, and learn the reasoning. This is one to add to your experience.Irruption
K
50

A std::string's allocation is not guaranteed to be contiguous under the C++98/03 standard, but C++11 forces it to be. In practice, neither I nor Herb Sutter know of an implementation that does not use contiguous storage.

Notice that the &s[0] thing is always guaranteed to work by the C++11 standard, even in the 0-length string case. It would not be guaranteed if you did str.begin() or &*str.begin(), but for &s[0] the standard defines operator[] as:

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified

Continuing on, data() is defined as:

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

(notice the square brackets at both ends of the range)


Notice: pre-standardization C++0x did not guarantee &s[0] to work with zero-length strings (actually, it was explicitly undefined behavior), and an older revision of this answer explained this; this has been fixed in later standard drafts, so the answer has been updated accordingly.

Kori answered 31/12, 2009 at 20:24 Comment(9)
I have not been following the standard for the last few months, but it was my impression was this was still in the 0x draft, and therefor not actually yet required (or will be if a library chooses to only implemented '03).Kori
Sutter says in a comment to that post, "current ISO C++ does require &str[0] to cough up a pointer to contiguous string data (but not necessarily null-terminated!)," which would in fact make the OP's usage correct. However, I can't find anything that says that in the standard (at least it's not in 21.3.4 lib.string.access).Swami
I think that might be right; the std defect 530 says operator[] is contiguous but the iterator interface is not guaranteed to be, and quotes 23.4.4. I am digging out my standard to check.Kori
I skipped right over the defect link in Sutter's post, that's why I missed it. In any case, the defect says "we almost require contiguity already," (key word: almost) and I don't see how its reference to multiset is relevant (basic_string is a sequence with random access iterators). However, I think the important thing to take away is that "given the existence of data(), and the definition of operator[] and at in terms of data, I don't believe it's possible to write a useful and standard- conforming basic_string that isn't contiguous."Swami
Todd, did you mean to say that you (and Sutter) do not know of an implementation? If you do know of an implementation, could you name it for us?Nightwear
@Rob - fixed, thanks. @James - Yeah, I think I understand where they are going with it (though I do not understand the 23.4.4 ref). (strike a part about interesting edge cases, I was wrong about that)Kori
James: the almost is because the null for s[s.length()] does not have to be contiguous. &s[n] + 1 == &s[n + 1] must be true for all n where 0 <= n < s.length() - 1. The requirement is buried in 21.3.4/1 that s[n] must return the same object as s.data()[n] (for n < length()), and data() must be contiguous.Summers
The information about zero-length strings is incorrect; the last C++11 draft actually says "Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.". So, it's always safe to do &str[0].Septempartite
Since OP doesn't seem to be active on SO anymore, I updated the answer myself, it would be bad to have a top-voted, accepted question to a common question spreading obsolete information.Septempartite
K
9

It is safe to use. I think most answers were correct once, but the standard changed. Quoting from C++11 standard, basic_string general requirements [string.require], 21.4.1.5, says:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

A bit before that, it says that all iterators are random access iterators. Both bits support the usage of your question. (Additionally, Stroustrup apparently uses it in his newest book ;) )

It's not unlikely that this change was done in C++11. I seem to remember that the same guarantee was added then for vector, which also got the very useful data() pointer with that release.

Hope that helps.

Kass answered 19/1, 2015 at 12:36 Comment(1)
The question was pre-c++11 (it is tagged as such). You are correct, c++11 made it officially safe to do this.Narva
S
7

Technically, no, since std::string is not required to store its contents contiguously in memory.

However, in almost all implementations (every implementation of which I am aware), the contents are stored contiguously and this would "work."

Swami answered 31/12, 2009 at 20:24 Comment(7)
Can you identify some implementations where it wouldn't work?Nightwear
Nope. But you could make such an implementation if you wanted.Swami
@Neil: Do you have a link/reference to that TC?Swami
Aargh - sorry, brain going - I'm thinking of vector, not string. Apologies all round.Lisalisabet
No problem. I'm still curious as to what Sutter is talking about regarding &str[0], though (cf. my comment to Todd's answer).Swami
@JamesMcNellis: I don't understand how string is not required to store its contents contiguously in memory (i.e. before C++11, where it was required). Wouldn't data and c_str be impossible to implement in constant time if strings were discontiguous?Unanimous
@Mehrdad: There is no requirement in C++03 that data or c_str have constant time complexity. Further, C++03 §21.3/5 states that both data and c_str may invalidate iterators, references, and pointers to elements in the sequence.Swami
N
3

Readers should note that this question was asked in 2009, when the C++03 Standard was the current publication. This answer is based on that version of the Standard, in which std::strings are not guaranteed to utilize contiguous storage. Since this question was not asked in the context of a particular platform (like gcc), I make no assumptions about OP's platform -- in particular, weather or not it utilized contigious storage for the string.

Legal? Maybe, maybe not. Safe? Probably, but maybe not. Good code? Well, let's not go there...

Why not just do:

std::string s = str;

...or:

std::string s(str);

...or:

std::string s;
std::copy( &str[0], &str[strLen], std::back_inserter(s));

...or:

std::string s;
s.assign( str, strLen );

?

Namtar answered 31/12, 2009 at 20:32 Comment(2)
std::string s (str, strLen); (Shortest form identical, in case of embedded nulls or lacking null termination, to the original behavior from the question.)Summers
@Downvoter: Note that this question was asked in 2009, and pertains to the C++03 standard. If you're downvoting either because you challenge the technical accuracy of my answer or for some other reason, I'd appreciate feedback.Namtar
N
2

This is generally not safe, regardless of whether the internal string sequence is stored in memory continuously or not. There's might be many other implementation details related to how the controlled sequence is stored by std::string object, besides the continuity.

A real practical problem with that might be the following. The controlled sequence of std::string is not required to be stored as a zero-terminated string. However, in practice, many (most?) implementations choose to oversize the internal buffer by 1 and store the sequence as a zero-terminated string anyway because it simplifies the implementation of c_str() method: just return a pointer to the internal buffer and you are done.

The code you quoted in your question does not make any effort to zero-terminate the data is copied into the internal buffer. Quite possibly it simply doesn't know whether zero-termination is necessary for this implementation of std::string. Quite possibly it relies on the internal buffer being filled with zeros after the call to resize, so the extra character allocated for the zero-terminator by the implementation is conveniently pre-set to zero. All this is an implementation detail, meaning that this technique depends on some rather fragile assumptions.

In other words, in some implementations, you'd probably have to use strcpy, not memcpy to force the data into the controlled sequence like that. While in some other implementations you'd have to use memcpy and not strcpy.

Normandnormandy answered 31/12, 2009 at 20:46 Comment(3)
After the call to resize you can be quite sure that the internal string is or isn't null-terminated as the implementation requires. After a call to resize after all you must have a valid string of n characters (padded with zero characters as needed). - However, it shows a lack of understanding for the std::string class: memcpy is used either out of ignorance or as a misguided attempt for performance (because of the resize call the code ends up assigning values to the buffer twice).Natal
@UncleBens: I don't understand your first sentence. In any case, yes, the language standard guarantees that the size-increasing resize call pads the string with zeros. However, the standard guarantees the padding only up to the requested size (strLength in this case), but there's no guarantee in the standard for that extra character, if the implementation allocates one.Normandnormandy
From C++11 onward, when the string is not empty, the internal buffer is required to be null-terminated, because both data() and c_str() are required to return the same buffer, and c_str() is required to always return a pointer to a null-terminated buffer (data() is allowed to return nullptr when empty). Prior to C++11, the internal buffer was not required to be null-terminated (or even contiguous), but most implementations were because it simplified the implementation of c_str()Treadmill
V
0

The code might work, but more by luck than judgement, it makes assumptions about the implementation that are not guaranteed. I suggest determining the validity of the code is irrelevant while it is a pointless over complication that is easily reduced to just:

std::string s( str ) ;

or if assigning to an existing std::string object, just:

s = str ;

and then let std::string itself determine how to achieve the result. If you are going to resort to this sort of nonsense, then you may as well not be using std::string and stick to since you are reintroducing all the dangers associated with C strings.

Voltaire answered 31/12, 2009 at 20:47 Comment(2)
I actually can't be sure the string being assigned is null terminated. So the best I could do will probably be s.assign( ptr, ptrLength); which is still an improvement I think.Narva
Use the constructor form: std::string s (str, strLen);Readable

© 2022 - 2024 — McMap. All rights reserved.