string c_str() vs. data()
Asked Answered
S

6

147

I have read several places that the difference between c_str() and data() (in STL and other implementations) is that c_str() is always null terminated while data() is not. As far as I have seen in actual implementations, they either do the same or data() calls c_str().

What am I missing here? Which one is more correct to use in which scenarios?

Shinar answered 11/10, 2008 at 20:59 Comment(0)
M
147

The documentation is correct. Use c_str() if you want a null terminated string.

If the implementers happend to implement data() in terms of c_str() you don't have to worry, still use data() if you don't need the string to be null terminated, in some implementation it may turn out to perform better than c_str().

strings don't necessarily have to be composed of character data, they could be composed with elements of any type. In those cases data() is more meaningful. c_str() in my opinion is only really useful when the elements of your string are character based.

Extra: In C++11 onwards, both functions are required to be the same. i.e. data is now required to be null-terminated. According to cppreference: "The returned array is null-terminated, that is, data() and c_str() perform the same function."

Multilateral answered 11/10, 2008 at 21:1 Comment(3)
Extra 2: In C++17 onwards, there is now also a non-const overload for .data(), so they are no longer equivalent for non-constant strings.Janeljanela
If the data is not character type why would you use string instead of vector or other containers?Lithoid
@Fayeure, I guess because std::basic_string<T> has features a vector does not have, e.g. it’s easy to concatenate strings.Okajima
K
39

In C++11/C++0x, data() and c_str() is no longer different. And thus data() is required to have a null termination at the end as well.

21.4.7.1 basic_string accessors [string.accessors]

const charT* c_str() const noexcept;

const charT* data() const noexcept;

1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].


21.4.5 basic_string element access [string.access]

const_reference operator[](size_type pos) const noexcept;

1 Requires: pos <= size(). 2 Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

Kea answered 13/9, 2012 at 4:41 Comment(5)
What if the string is composed of non-character data, which is legal for string data AFAIK, including null?Loiseloiter
@Loiseloiter Even when storing binary data, C++11 requires that std::string allocate an extra char for a trailing '\0'. When you do std::string s("\0");, both s.data()[0] and s.data()[1] are guaranteed to evaluate to 0.Navarrette
@Navarrette Any reference?Mcilroy
"In C++11/C++0x, data() and c_str() is no longer different. " <- true, but please don't forget that since C++17, there is at least one difference : data() can return non-const pointer.Finalism
Can you confirm that myStr.data() is the same as &myStr[0] ?Finalism
K
20

Even know you have seen that they do the same, or that .data() calls .c_str(), it is not correct to assume that this will be the case for other compilers. It is also possible that your compiler will change with a future release.

2 reasons to use std::string:

std::string can be used for both text and arbitrary binary data.

//Example 1
//Plain text:
std::string s1;
s1 = "abc";

//Example 2
//Arbitrary binary data:
std::string s2;
s2.append("a\0b\0b\0", 6);

You should use the .c_str() method when you are using your string as example 1.

You should use the .data() method when you are using your string as example 2. Not because it is dangereous to use .c_str() in these cases, but because it is more explicit that you are working with binary data for others reviewing your code.

Possible pitfall with using .data()

The following code is wrong and could cause a segfault in your program:

std::string s;
s = "abc";   
char sz[512]; 
strcpy(sz, s.data());//This could crash depending on the implementation of .data()

Why is it common for implementers to make .data() and .c_str() do the same thing?

Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.

Kathrinekathryn answered 11/10, 2008 at 21:12 Comment(6)
Actually, the point of .data() is that it should not copy the internal buffer. This means an implementation does not have to waste a char on \0 until it's needed. You'd never want two buffers: if you DO call .c_str(), append a \0 to the buffer. .data() can still return that buffer.Entirely
Agreed fully it would be ridiculous to use 2 buffers. How do you know that's why .data was intended though?Kathrinekathryn
@BrianR.Bondy I tried this code: .. auto str = string { "Test \0String!" }; cout << "DATA : " << str.data() << endl; The output is "Test" and not the whole string, What did I do wrong?Woolley
The last part is wrong, data and c_str could use the same buffer without it being 0-terminated - c_str could simply add the 0 on the first call.Waggoner
heads up, c++11 made .data() an alias for .c_str()Dayledaylight
@BrianR.Bondy If I understand you correctly, strcpy(sz, s.data()) could be wrong because std::data() does not guarantee there is null terminated, before C++11. Am I right?Mcilroy
S
3

It has been answered already, some notes on the purpose: Freedom of implementation.

std::string operations - e.g. iteration, concatenation and element mutation - don't need the zero terminator. Unless you pass the string to a function expecting a zero terminated string, it can be omitted.

This would allow an implementation to have substrings share the actual string data: string::substr could internally hold a reference to shared string data, and the start/end range, avoiding the copy (and additional allocation) of the actual string data. The implementation would defer the copy until you call c_str or modify any of the strings. No copy would ever be made if the sub-strings involved are just read.

(copy-on-write implementation aren't much fun in multithreaded environments, plus the typical memory/allocation savings aren't worth the more complex code today, so it's rarely done).


Similarly, string::data allows a different internal representation, e.g. a rope (linked list of string segments). This can improve insert / replace operations significantly. again, the list of segments would have to be collapsed to a single segment when you call c_str or data.

Shirleeshirleen answered 1/7, 2012 at 8:4 Comment(2)
What actually is done when string::c_str() is invoked? string::c_str() will allocate memory , copy the internal data and append a null-terminated character to the newly allocated memory?Mcilroy
No, you are wrong indeed. See https://mcmap.net/q/160808/-what-actually-is-done-when-string-c_str-is-invoked/13611002 for detail.Mcilroy
H
3

All the previous commments are consistence, but I'd also like to add that starting in c++17, str.data() returns a char* instead of const char*

Hobson answered 25/2, 2019 at 18:23 Comment(1)
Both const and non-const overloads are available since C++17.Bathrobe
L
2

Quote from ANSI ISO IEC 14882 2003 (C++03 Standard):

    21.3.6 basic_string string operations [lib.string.ops]

    const charT* c_str() const;

    Returns: A pointer to the initial element of an array of length size() + 1 whose first size() elements
equal the corresponding elements of the string controlled by *this and whose last element is a
null character specified by charT().
    Requires: The program shall not alter any of the values stored in the array. Nor shall the program treat the
returned value as a valid pointer value after any subsequent call to a non-const member function of the
class basic_string that designates the same object as this.

    const charT* data() const;

    Returns: If size() is nonzero, the member returns a pointer to the initial element of an array whose first
size() elements equal the corresponding elements of the string controlled by *this. If size() is
zero, the member returns a non-null pointer that is copyable and can have zero added to it.
    Requires: The program shall not alter any of the values stored in the character array. Nor shall the program
treat the returned value as a valid pointer value after any subsequent call to a non- const member
function of basic_string that designates the same object as this.
Livvyy answered 5/10, 2011 at 13:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.