What is string_view?
Asked Answered
J

2

251

string_view was a proposed feature within the C++ Library Fundamentals TS(N3921) added to C++17

As far as i understand it is a type that represent some kind of string "concept" that is a view of any type of container that could store something viewable as a string.

  • Is this right ?
  • Should the canonical const std::string& parameter type become string_view ?
  • Is there another important point about string_view to take into consideration ?
Janiuszck answered 27/12, 2013 at 16:10 Comment(3)
Finally, someone realizes that strings need a different semantics, although introducing string_view is only a small step.Satellite
Read this learncpp.com/cpp-tutorial/an-introduction-to-stdstring_view. Provides an easy to understand explanations with examples. Very comprehensive.Quinsy
Interesting fact: std::string_view's .length() member needn't match strlen(thatSameStringView.data())Hokusai
P
252

The purpose of any and all kinds of "string reference" and "array reference" proposals is to avoid copying data which is already owned somewhere else and of which only a non-mutating view is required. The string_view in question is one such proposal; there were earlier ones called string_ref and array_ref, too.

The idea is always to store a pair of pointer-to-first-element and size of some existing data array or string.

Such a view-handle class could be passed around cheaply by value and would offer cheap substringing operations (which can be implemented as simple pointer increments and size adjustments).

Many uses of strings don't require actual owning of the strings, and the string in question will often already be owned by someone else. So there is a genuine potential for increasing the efficiency by avoiding unneeded copies (think of all the allocations and exceptions you can save).

The original C strings were suffering from the problem that the null terminator was part of the string APIs, and so you couldn't easily create substrings without mutating the underlying string (a la strtok). In C++, this is easily solved by storing the length separately and wrapping the pointer and the size into one class.

The one major obstacle and divergence from the C++ standard library philosophy that I can think of is that such "referential view" classes have completely different ownership semantics from the rest of the standard library. Basically, everything else in the standard library is unconditionally safe and correct (if it compiles, it's correct). With reference classes like this, that's no longer true. The correctness of your program depends on the ambient code that uses these classes. So that's harder to check and to teach.

Note that if C++17's std::string_view is created from/for a std::string, then as soon as said std::string gets out-of-scope the said std::string_view's behavior will be undefined.
Also, the Qt framework renamed QStringRef to QStringView, but both said Qt classes have similar out-of-scope behavior as std::string_view, just instead "undefined" it's a dangling-QString-pointer.

Pelion answered 27/12, 2013 at 16:18 Comment(23)
The ship sailed on that philosophy with reference_wrapper, didn't it?Hofstetter
@KerrekSB I am afraid I don't follow. Could you expand on the "such referential view classes have completely different ownership semantics from the rest of the standard library" part, please? It's not clear to me: How is it different from dangling references / pointers? Or invalidated iterators due to insertion (e.g. std::vector)? We have these issues already, it is very natural to me that a non-owning view will have similar issues as non-owning pointers / references / iterators have.Guerin
@SteveJessop: Yes :-) I was almost going to include that. I guess the use of reference wrappers is fairly limited, but it's a valid point.Pelion
@Ali: When you're using any other standard library container, you can assert the correctness of the code just by looking at the code that uses the container. Not so for string_view. (I wasn't saying that you can never write broken code. Just that the brokenness is local.)Pelion
@KerrekSB Okay so basically it represent an object which is only able to do read-only operations on a string at a cheaper performance cost. Somehow it looks like std::weak_ptr, but i guess that some kind of lock member function for "safe access" would take out all the point of performance gain. Anyway a const std::string& validity is already tied to the original string unless you copy it, so nothing is lost indeed.Janiuszck
@Drax: think also of situations where you pass in a string literal. If the function argument were a const string & you'd have to create a costly and exception-throwing temporary. With a string view, nothing needs to be copied or allocated.Pelion
@KerrekSB Indeed ! Do you think their will be some kind of constexpr constructor for initialization from literals or another string_view so they could be used as default string constants without unpredictable static order initialization ?Janiuszck
@Drax: I'm sure it's being worked on; I'm not sure if any of this will make it into the next standard. String literals are weird; you can't really tell them apart from arrays.Pelion
Iterators come to mind. (In fact some of the standardization work being done for string_view etc. involves making sure this will play nicely along ranges — keeping in mind a pair of iterators may very well be a model of a range, depending on details.) Not to mention auto uhoh = [] { return { 1, 2, 3, 4, 5 }; }();Selfmastery
@LucDanton: Brr. What's the type of uhoh??Pelion
std::initializer_list<int>.Selfmastery
I'm surprised they didn't go with std::range from boost::iterator_range - IMO it's better than the string_view ideaTwirl
I don't see the point of making a view non-mutating. Just make it mutable by default and add const iff desired like string_span in the GSL.Salim
@nwp: Many people and languages have come to lament C++'s awful defaults and think that "const" and "unshared" should be the default, with "mutable" and "shared" the explicit, rare exceptions.Pelion
@KerrekSB I agree with those people, but unless you rewrite C++ with const as default to be overwritten by mutable a view should not be default const.Salim
You have to start somewhere?Croesus
It's more to the string view than this. It can also handle compile-time strings including compile time hashing of compile-time strings.Raindrop
@Croesus Perhaps but consistency is king, particularly as C++ will never be rewritten with const as default, so what you're starting will never be finished, and at the cost of an inconsistency that'll never be resolved.Kumkumagai
@LightnessRacesinOrbit: "C++ will never be rewritten with const as default" - I thought that was called Rust?Garlicky
@Garlicky Well exactly :DKumkumagai
But what's the advantage over const std::string& ?Graceless
@peterflynn: If your argument is, say, a string literal, then a const std::string& parameter would require dynamic allocation and copying, whereas a string view would not.Pelion
"(if it compiles, it's correct)" … umm, what? Sorry for laughing hysterically here. Given the heap of UB which no compiler flags this statement is just wrong.Busey
M
8

(Educating myself in 2021)

From Microsoft's <string_view>:

The string_view family of template specializations provides an efficient way to pass a read-only, exception-safe, non-owning handle to the character data of any string-like objects with the first element of the sequence at position zero. (...)

From Microsoft's C++ Team Blog std::string_view: The Duct Tape of String Types from August 21st, 2018 (retrieved 2021 Apr 01):

string_view solves the “every platform and library has its own string type” problem for parameters. It can bind to any sequence of characters, so you can just write your function as accepting a string view:

void f(wstring_view); // string_view that uses wchar_t's

and call it without caring what stringlike type the calling code is using (and > for (char*, length) argument pairs just add {} around them) (...)

(...)

Today, the most common “lowest common denominator” used to pass string data around is the null-terminated string (or as the standard calls it, the Null-Terminated Character Type Sequence). This has been with us since long before C++, and provides clean “flat C” interoperability. However, char* and its support library are associated with exploitable code, because length information is an in-band property of the data and susceptible to tampering. Moreover, the null used to delimit the length prohibits embedded nulls and causes one of the most common string operations, asking for the length, to be linear in the length of the string.

(...)

Each programming domain makes up their own new string type, lifetime semantics, and interface, but a lot of text processing code out there doesn’t care about that. Allocating entire copies of the data to process just to make differing string types happy is suboptimal for performance and reliability.

Marlenmarlena answered 1/4, 2021 at 6:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.