std::string::c_str & Null termination
Asked Answered
D

3

9

I've read various descriptions of std::string::c_str including questions raised on SO over the years/decades,

I like this description for its clarity:

Returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object. This array includes the same sequence of characters that make up the value of the string object plus an additional terminating null-character ('\0') at the end.

However some things about the purpose of this function are still unclear.

You could be forgiven for thinking that calling c_str might add a \0 character to the end of the string which is stored in the internal char array of the host object (std::string):

s[s.size+1] = '\0'

But it seems std::string objects are Null terminated by default even before calling c_str: enter image description here

After looking through the definition:

const _Elem *c_str() const _NOEXCEPT
{   // return pointer to null-terminated nonmutable array
    return (this->_Myptr());
}

I don't see any code which would add \0 to the end of a char array. As far as I can tell c_str just returns a pointer to the char stored in the first element of the array pretty much like begin() does. I don't even see code which checks that the internal array is terminated by \0

Or am I missing something?

Denominative answered 5/1, 2017 at 13:0 Comment(8)
You missed the difference between the C++ 2003 Standard and C++ 2011 Standard. Before the C++ 2011 Standard objects of the class std::string may store strings without the terminating zero.Plaint
1. Zero added automatically while string created or modified. 2. String constant "123456789" allways be ASCIIZ and have zero char at the end.Electrokinetic
Obviously, function c_str() doesn't append a 0 at the end of the array every time it is called, because it would need to allocate a new array, and the caller (presumably you) would need to deallocate it every time and again...Trilemma
@Vlad from Moscow Ah! could it be said that c_str is something of a legacy function? I'm struggling to think of a use for it in 2016? If indeed std::string objects are now null terminated by default.Denominative
@Denominative I do not understand the meaning of the words "a legacy function". However AFAIK it is required now by the C++ Standard that stored strings were null-terminated..Plaint
@Electrokinetic - string literals are not required to be represented in ASCII, nor ASCIIZ (whatever that is). They are required to have a nul terminator.Misestimate
@Denominative you'll notice that c_str is marked const this means according to the current language in the standard that it should be, relatively speaking, data race free. If you modify the string in a const method you would have to lock because you need to verify the values hasn't changed. The standard requires that c_str is also O(1) which in essence means that the internal representation must be zero terminated.Iridium
@Vlad from Moscow By legacy function I mean a function which no longer serves any useful purpose within the current standard ..as you say stored strings are null-terminated by default, so c_str really does nothing beyond returning a pointer to the first element of the string's array, ...it does not check/add a null-termination to the string, it behaves pretty much like begin() as far as I can see?Denominative
S
12

Before C++11, there was no requirement that a std::string (or the templated class std::basic_string - of which std::string is an instantiation) store a trailing '\0'. This was reflected in different specifications of the data() and c_str() member functions - data() returns a pointer to the underlying data (which was not required to be terminated with a '\0' and c_str() returned a copy with a terminating '\0'. However, equally, there was no requirement to NOT store a trailing '\0' internally (accessing characters past the end of the stored data was undefined behaviour) ..... and, for simplicity, some implementations chose to append a trailing '\0' anyway.

With C++11, this changed. Essentially, the data() member function was specified as giving the same effect as c_str() (i.e. the returned pointer is to the first character of an array that has a trailing '\0'). That has a consequence of requiring the trailing '\0' on the array returned by data(), and therefore on the internal representation.

So the behaviour you're seeing is consistent with C++11 - one of the invariants of the class is a trailing '\0' (i.e. constructors ensure that is the case, member functions which modify the string ensure it remains true, and all public member functions can rely on it being true).

The behaviour you're seeing is not inconsistent with C++ standards before C++11. Strictly speaking, std::string before C++11 was not required to maintain a trailing '\0' but, equally, an implementer could choose to do so.

Sharp answered 5/1, 2017 at 13:27 Comment(0)
C
6

You do not see code that adds '\0' to the end of the sequence because null character is already there. An implementation of c_str cannot return a pointer to new array, so the array must be stored on the std::string object itself.

Hence, you have two valid approaches for implementing this:

  1. Always store '\0' at the end of _Myptr() array of characters on construction, or
  2. Make a copy of the string on demand, add '\0' when c_str() is called, and delete the copy in the destructor.

The first approach lets you return _Myptr() for c_str(), at the expense of storing an extra character for each string. The second approach requires an extra pointer per std::string object, so the first approach is less expensive.

Cleodel answered 5/1, 2017 at 13:7 Comment(0)
B
1

The requirement is the c_str must return a null terminated cstring. There is nothing saying that the function has to add the null terminator. Most implementations (and I think all that want to be standard compliant) store the null terminator in the underlying buffer used by the string itself. One reason for this is that

std::string s;
assert(s[0] == '\0');

Has to work since string is now required to return the null terminator at string[string.size()]. If string did not store the null terminator in the underlying buffer then [] would have to do bounds checking to see if it is at size() and needs to return \0.

Bilbrey answered 5/1, 2017 at 13:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.