Is std::string always null-terminated in C++11? [duplicate]
Asked Answered
U

3

54

In a 2008 post on his site, Herb Sutter states the following:

There is an active proposal to tighten this up further in C++0x and require null-termination and possibly ban copy-on-write implementations, for concurrency-related reasons. Here’s the paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534.html . I think that one or both of the proposals in this paper is likely to be adopted, but we’ll see at the next meeting or two.

I know that C++11 now guarantees that the std::string contents get stored contiguously, but did they adopt the above in the final draft?

Will it now be safe to use something like &str[0]?

Unciform answered 20/5, 2011 at 20:20 Comment(1)
Guaranty that contents stored contiguously is provided in C++03 as well?Financier
M
59

Yes, per [string.accessors] p1, std::basic_string::c_str():

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

Complexity: constant time.

Requires: The program shall not alter any of the values stored in the character array.

This means that given a string s, the pointer returned by s.c_str() must be the same as the address of the initial character in the string (&s[0]).

Megaron answered 20/5, 2011 at 20:30 Comment(9)
Note that the same requirement holds true for data, which I believe wasn't true for C++98/03.Ma
Yes, it's illuminating that basic_string<>::c_str() and basic_string<>::data() now have exactly identical semantics.Phonogram
This doesn't appear to answer the question with which the post is titled - ie "Will std::string always be null-terminated in C++11?", in which case the answer is no. operator[str.length()] will return '\0', but that doesn't mean that the string actually contains it in memory.Mayan
I read this as well in the final C++11 spec (21.4.7.1/1), but I don't see how any requirements are placed on the element at operator[str.length()], other than it must be valid a referenceable.Thorsten
@AndrewMarshall: operator[] is required to return a reference to the actual stored element, so (21.4.7.1/1) also applies the requirement that the element at operator[str.length()] must be part of the storage.Thorsten
Shouldn't it be [0,size())?Buseck
@S.S.Anne No, in this case, the terminator is part of the sequence. Not that it is always part of the sequence, look e.g. .at().Read
@JohnDibling Could you give us any reference for your mention that std::string's operator[] is required to return a reference to the actually stored element?Ardellardella
@starriet - no, i'm sorry but I no longer can. That reference surely came from the C++98 standard, but I don't have that handy these days.Thorsten
M
-1

&str[0] is safe to use -- so long as you do not assume it points to a null-terminated string.

Since C++11 the requirements include (section [string.accessors]):

  • str.data() and str.c_str() point to a null-terminated string.
  • &str[i] == str.data() + i , for 0 <= i <= str.size()
    • note that this implies the storage is contiguous.

However, there is no requirement that &str[0] + str.size() points to a null terminator.

A conforming implementation must place the null terminator contiguously in storage when data(), c_str() or operator[](str.size()) are called; but there is no requirement to place it in any other situation, such as calls to operator[] with other arguments.


To save you on reading the long chat discussion below: The objection was been raised that if c_str() were to write a null terminator, it would cause a data race under res.on.data.races#3 ; and I disagreed that it would be a data race .

Misquotation answered 11/6, 2021 at 13:14 Comment(18)
The constexpr const CharT* data() const noexcept; overload can't modify anything, so it has to be there from the startSterlingsterlitamak
@Sterlingsterlitamak The text you quote was added in C++20Misquotation
@Misquotation it's been a const member function and had an O(1) requirement since at least C++11 if not longer. De-facto it had to be zero terminated internally. Edit: yes it was const priorAileen
@Aileen placing a null terminator is O(1) since the length is known. A const member function is allowed to modify mutable internal storage of an object; and any dynamically allocated storage that the object holds an internal pointer toMisquotation
It wasn't constexpr prior to C++20, but the requirement still standsSterlingsterlitamak
@Misquotation modifying the internal buffer would linguistically invalidate iterators, something that method and in fact all const accessors are explicitly prohibited by the standard from doing.Aileen
@Aileen No it wouldn't . Iterator invalidation applies to calls the user makes to member functions of the string , not by any internal operation the implementation makes. The implementation only has to provide the guarantees that the standard places on the observable behaviourMisquotation
If it were allowed to modify the buffer, it would have to do it in a way where there was no possibility of a data race, which I don't think is possible without a per-string mutex or similarSterlingsterlitamak
@Misquotation 21.3.3.8.1.3 "Remarks: The program shall not modify any of the values stored in the character array; otherwise, the behavior is undefined." that includes the accessor method. Library calls are part of the program ;)Aileen
@Sterlingsterlitamak look, no errors . (Reiterate my point that a const member function may modify dynamically allocated storage)Misquotation
@Aileen The implementation is not part of the programMisquotation
"Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container member function or passing a container as an argument to a library function shall not invalidate iterators to, or change the values of, objects within that container." container.requirements.general#11Sterlingsterlitamak
@Sterlingsterlitamak I'm not aware of the standard providing any guarantee about thread-safety of any std::string operation (or any other standard container unless explicitly mentioned), in general any standard library object might contain or point to storage that can be modified by a const member function . If you can point to something in the standard that talks about thread safety of std::string then go aheadMisquotation
"A C++ standard library function shall not directly or indirectly modify objects ([intro.multithread]) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's non-const arguments, including this." res.on.data.races#3Sterlingsterlitamak
@Sterlingsterlitamak But it does explicitly specify that data() shall provide a null terminator. Also in your quote "object in the container" refers to the object being contained; not to any internal detail of the containerMisquotation
The char[] that std::string owns includes the nul terminator, that is the char that is 0 is an object within that containerSterlingsterlitamak
@Sterlingsterlitamak I don't agree, if a container's size() is 4 then it contains 4 elements . There can be various amounts of bookkeeping and other storage used by the container, but this clause is referring to the conceptual contents of the container, not the bookkeepingMisquotation
Let us continue this discussion in chat.Sterlingsterlitamak
E
-3

Although c_str() returns a null terminated version of the std::string, surprises may await when mixing C++ std::string with C char* strings.

Null characters may end up within a C++ std::string, which can lead to subtle bugs as C functions will see a shorter string.

Buggy code may overwrite the null terminator. This results in undefined behaviour. C functions would then read beyond the string buffer, potentially causing a crash.

#include <string>
#include <iostream>
#include <cstdio>
#include <cstring>

int main()
{
    std::string embedded_null = "hello\n";
    embedded_null += '\0';
    embedded_null += "world\n";

    // C string functions finish early at embedded \0
    std::cout << "C++ size: " << embedded_null.size() 
              << " value: " << embedded_null;
    printf("C strlen: %d value: %s\n", 
           strlen(embedded_null.c_str()), 
           embedded_null.c_str());

    std::string missing_terminator(3, 'n');
    missing_terminator[3] = 'a'; // BUG: Undefined behaviour

    // C string functions read beyond buffer and may crash
    std::cout << "C++ size: " << missing_terminator.size() 
              << " value: " << missing_terminator << '\n';
    printf("C strlen: %d value: %s\n", 
           strlen(missing_terminator.c_str()), 
           missing_terminator.c_str());
}

Output:

$ c++ example.cpp
$ ./a.out
C++ size: 13 value: hello
world
C strlen: 6 value: hello

C++ size: 3 value: nnn
C strlen: 6 value: nnna�
Euton answered 22/4, 2021 at 22:4 Comment(4)
"missing_terminator[3] = 'a';" That's explicitly UB. You can read from the NUL terminator, but you cannot write to it. Well, you can't write any value other than NUL to it.Libation
I wouldn't say "c_str() generally returns", since C++11 it "returns a pointer to a null-terminated character array with data equivalent to those stored in the string.".Kemppe
Replacing the null-terminator with another character is UB. However is an embedded null allowed? Both lead to problems, neither is caught by GCC or Clang.Euton
Yes they are allowed.Kemppe

© 2022 - 2024 — McMap. All rights reserved.