Concatenating string_view objects

E

4

15

I've been adding std::string_views to some old code for representing string like config params, as it provides a read only view, which is faster due to no need for copying.

However, one cannot concatenate two string_view together as the operator+ isn't defined. I see this question has a couple answers stating its an oversight and there is a proposal in for adding that in. However, that is for adding a string and a string_view, presumably if that gets implemented, the resulting concatenation would be a std::string

Would adding two string_view also fall in the same category? And if not, why shouldn't adding two string_view be supported?

Sample

std::string_view s1{"concate"};
std::string_view s2{"nate"};
std::string_view s3{s1 + s2};

And here's the error

error: no match for 'operator+' (operand types are 'std::string_view' {aka 'std::basic_string_view<char>'} and 'std::string_view' {aka 'std::basic_string_view<char>'})

Evangelin answered 24/8, 2022 at 19:4 Comment(2)

A string_view does not own the data. It must be stored somewhere. For concatenating, use a std::string. – Milkweed 24/8, 2022 at 19:7

The "oversight" mentioned in the other question is the oversight of adding a string_view to a string, and not adding two string_views together. There's no proposal to implement like this for the simple reason that it's logically impossible to do anything like this, for fundamental reasons. – Wadmal 24/8, 2022 at 19:7

S

12

A std::string_view is an alias for std::basic_string_view<char>, which is a std::basic_string_view templated on a specific type of character, i.e. char.

But what does it look like?

Beside the fairly large number of useful member functions such as find, substr, and others (maybe it's an ordinary number, if compared to other container/string-like things offered by the STL), std::basic_string_view<_CharT>, with _CharT being the generic char-like type, has just 2 data members,

// directly from my /usr/include/c++/12.2.0/string_view
      size_t        _M_len;
      const _CharT* _M_str;

i.e. a constant pointer to _CharT to indicate where the view starts, and a size_t (an appropriate type of number) to indicate how long the view is starting from _M_str's pointee.

In other words, a string view just knows where it starts and how long it is, so it represents a sequence of char-like entities which are contiguous in memory. With just two such members, you can't represent a string which is made up of non-contiguous substrings.

Yet in other words, if you want to create a std::string_view, you need to be able to tell how many chars it is long and from which position. Can you tell where s1 + s2 would have to start and how many characters it should be long? Think about it: you can't, becase s1 and s2 are not adjacent.

Maybe a diagram can help.

Assume these lines of code

std::string s1{"hello"};
std::string s2{"world"};

s1 and s2 are totally unrelated objects, as far as their memory location is concerned; here is what they looks like:

                           &s2[0]
                             |
                             | &s2[1]
                             |   |
&s1[0]                       |   | &s2[2]
  |                          |   |   |
  | &s1[1]                   |   |   | &s2[3]
  |   |                      |   |   |   |
  |   | &s1[2]               |   |   |   | &s2[4]
  |   |   |                  |   |   |   |   |
  |   |   | &s1[3]           v   v   v   v   v
  |   |   |   |            +---+---+---+---+---+
  |   |   |   | &s1[4]     | w | o | r | l | d |
  |   |   |   |   |        +---+---+---+---+---+
  v   v   v   v   v
+---+---+---+---+---+
| h | e | l | l | o |
+---+---+---+---+---+

I've intentionally drawn them misaligned to mean that &s1[0], the memory location where s1 starts, and &s2[0], the memory location where s2 starts, have nothing to do with each other.

Now, imagine you create two string views like this:

std::string_view sv1{s1};
std::string_view sv2(s2.begin() + 1, s2.begin() + 4);

Here's what they will look like, in terms of the two implementation-defined members _M_str and _M_len:

                                &s2[0]
                                  |
                                  | &s2[1]
                                  |   |
     &s1[0]                       |   | &s2[2]
       |                          |   |   |
       | &s1[1]                   |   |   | &s2[3]
       |   |                      |   |   |   |
       |   | &s1[2]               |   |   |   | &s2[4]
       |   |   |                  |   |   |   |   |
       |   |   | &s1[3]           v   v   v   v   v
       |   |   |   |            +---+---+---+---+---+
       |   |   |   | &s1[4]     | w | o | r | l | d |
       |   |   |   |   |        +---+---+---+---+---+
       v   v   v   v   v            · ^         ·
     +---+---+---+---+---+          · |         ·
     | h | e | l | l | o |        +---+         ·
     +---+---+---+---+---+        | ·           ·
     · ^                 ·        | · s2._M_len ·
     · |                 ·        | <----------->
   +---+                 ·        |
   | ·                   ·        +-- s2._M_str
   | ·       s1._M_len   ·
   | <------------------->
   |
   +-------- s1._M_str

Given the above, can you see what's wrong with expecting that

std::string_view s3{s1 + s2};

works?

How can you possible define s3._M_str and s3._M_len (based on s1._M_str, s1._M_len, s2._M_str, and s2._M_len), such that they represent a view on "helloworld"?

You can't because "hello" and "world" are located in two unrelated areas of memory.

Sublimity answered 24/8, 2022 at 19:24 Comment(7)

While this makes sense, it does not explain why you cannot concatenate a string and a string_view and get a string. Maybe the intention behind not allowing this is that it would be unclear if the intention is to produce a concatenated view (as with std::views::join in C++20, which is NOT a string_view, as it is not contiguous), or a new string, which holds the concatenated result in its own storage. – Alcahest 13/10, 2023 at 10:49

@FlorianWinter, if by concatenating a std::string with a std::string_view you mean a_string + a_string_view, allowing it would require that a_string_view be converted to std::string via the ctor (10), but that is explicit, so that's not possible. – Sublimity 16/10, 2023 at 6:21

Not really, because converting the string_view to a string would be an unnecessary memory allocation. Concatenating a string and a string_view should not require more than one memory allocation. In addition, if the string is an r-value, then it could be reused, e.g., its storage may already have the needed capacity to hold the result, requiring no memory allocation at all. While it could be implemented this way, I think there was an official reason given why the standard does not support this operation. – Alcahest 16/10, 2023 at 8:10

Furthermore, the intention of such a concatenation may vary. The intended result COULD be a string with its own storage, but it COULD also be a be a view (but not a string_view), which is what std::views::join does. Of course, the result of a concatenation cannot be a string_view (but it can be a view, in the sense of the C++20 ranges library). – Alcahest 16/10, 2023 at 8:15

Somehow I forgot about this conversation. @FlorianWinter, when I wrote would require that a_string_view be converted to std::string I was referring to the fact std::string_view has no operator+, and std::string's operator+ overloads all take std::string/char/char const*, and std::string_view is convertible to std::string, not to char const*. So I guess your curiosity is why doesn't operator+ have an overload for (std::string, std::string_view) and (std::string_view, std::string)? – Sublimity 3/1 at 17:46

My two comments were a combination of: (1) Concatenation of any number of strings into a new owning container (regardless of representation of the original strings) should never require more than one memory allocation and (2) The intention of a + operator that concatenates views and owning containers could be unclear, which may be one reason why such an operator does not (and should not?) exist. My previous comments are more detailed. These are just statements/comments. I have no further questions :). – Alcahest 9/1 at 8:17

Another way to see this: string_view is a contiguous range of characters. It is therefore inherently not closed under concatenation (but there are functions in the ranges library to concatenate views and get a non-contiguous range). basic_string, on the other hand, IS in principle closed under concatenation. It COULD be a convention in the standard that any legal concatenation of any range of characters with a basic_string would return a basic_string and perform no more than one memory allocation (and reuse the storage of an R-value basic_string argument if possible). – Alcahest 9/1 at 8:24

H

14

A view is similar to a span in that it does not own the data, as the name implies it is just a view of the data. To concatenate the string views you'd first need to construct a std::string then you can concatenate.

std::string s3 = std::string(s1) + std::string(s2);

Note that s3 will be a std::string not a std::string_view since it would own this data.

Horrify answered 24/8, 2022 at 19:10 Comment(4)

Alt. std::string s3 = std::string(s1) += s2; - I'm assuming it'll do one allocation less if s2 is not short enough for short string optimization. – Milkweed 24/8, 2022 at 19:19

So to store s3 as a string_view I should std::string_view s3{std::string(s1) + std::string(s2)}; – Evangelin 24/8, 2022 at 19:21

@Evangelin No, where will the data be stored then? As soon as the std::string goes out of scope, any string_view you have to it will not be useful anymore. You must store the std::string. You can make as many string_views you like to it - but you need to keep it alive. – Milkweed 24/8, 2022 at 19:21

@Evangelin I’ll post this as a full answer if I have the time but if you’re that determined to keep everything as a sv for a long as possible you could create a simple class that aggregates the svs. But frankly I wouldn’t worry about that unless you’re absolutely sure this is slowing down your system and that the solution is faster – Elementary 24/8, 2022 at 19:42

S

12

A std::string_view is an alias for std::basic_string_view<char>, which is a std::basic_string_view templated on a specific type of character, i.e. char.

But what does it look like?

Beside the fairly large number of useful member functions such as find, substr, and others (maybe it's an ordinary number, if compared to other container/string-like things offered by the STL), std::basic_string_view<_CharT>, with _CharT being the generic char-like type, has just 2 data members,

// directly from my /usr/include/c++/12.2.0/string_view
      size_t        _M_len;
      const _CharT* _M_str;

i.e. a constant pointer to _CharT to indicate where the view starts, and a size_t (an appropriate type of number) to indicate how long the view is starting from _M_str's pointee.

In other words, a string view just knows where it starts and how long it is, so it represents a sequence of char-like entities which are contiguous in memory. With just two such members, you can't represent a string which is made up of non-contiguous substrings.

Yet in other words, if you want to create a std::string_view, you need to be able to tell how many chars it is long and from which position. Can you tell where s1 + s2 would have to start and how many characters it should be long? Think about it: you can't, becase s1 and s2 are not adjacent.

Maybe a diagram can help.

Assume these lines of code

std::string s1{"hello"};
std::string s2{"world"};

s1 and s2 are totally unrelated objects, as far as their memory location is concerned; here is what they looks like:

                           &s2[0]
                             |
                             | &s2[1]
                             |   |
&s1[0]                       |   | &s2[2]
  |                          |   |   |
  | &s1[1]                   |   |   | &s2[3]
  |   |                      |   |   |   |
  |   | &s1[2]               |   |   |   | &s2[4]
  |   |   |                  |   |   |   |   |
  |   |   | &s1[3]           v   v   v   v   v
  |   |   |   |            +---+---+---+---+---+
  |   |   |   | &s1[4]     | w | o | r | l | d |
  |   |   |   |   |        +---+---+---+---+---+
  v   v   v   v   v
+---+---+---+---+---+
| h | e | l | l | o |
+---+---+---+---+---+

I've intentionally drawn them misaligned to mean that &s1[0], the memory location where s1 starts, and &s2[0], the memory location where s2 starts, have nothing to do with each other.

Now, imagine you create two string views like this:

std::string_view sv1{s1};
std::string_view sv2(s2.begin() + 1, s2.begin() + 4);

Here's what they will look like, in terms of the two implementation-defined members _M_str and _M_len:

                                &s2[0]
                                  |
                                  | &s2[1]
                                  |   |
     &s1[0]                       |   | &s2[2]
       |                          |   |   |
       | &s1[1]                   |   |   | &s2[3]
       |   |                      |   |   |   |
       |   | &s1[2]               |   |   |   | &s2[4]
       |   |   |                  |   |   |   |   |
       |   |   | &s1[3]           v   v   v   v   v
       |   |   |   |            +---+---+---+---+---+
       |   |   |   | &s1[4]     | w | o | r | l | d |
       |   |   |   |   |        +---+---+---+---+---+
       v   v   v   v   v            · ^         ·
     +---+---+---+---+---+          · |         ·
     | h | e | l | l | o |        +---+         ·
     +---+---+---+---+---+        | ·           ·
     · ^                 ·        | · s2._M_len ·
     · |                 ·        | <----------->
   +---+                 ·        |
   | ·                   ·        +-- s2._M_str
   | ·       s1._M_len   ·
   | <------------------->
   |
   +-------- s1._M_str

Given the above, can you see what's wrong with expecting that

std::string_view s3{s1 + s2};

works?

How can you possible define s3._M_str and s3._M_len (based on s1._M_str, s1._M_len, s2._M_str, and s2._M_len), such that they represent a view on "helloworld"?

You can't because "hello" and "world" are located in two unrelated areas of memory.

Sublimity answered 24/8, 2022 at 19:24 Comment(7)

While this makes sense, it does not explain why you cannot concatenate a string and a string_view and get a string. Maybe the intention behind not allowing this is that it would be unclear if the intention is to produce a concatenated view (as with std::views::join in C++20, which is NOT a string_view, as it is not contiguous), or a new string, which holds the concatenated result in its own storage. – Alcahest 13/10, 2023 at 10:49

@FlorianWinter, if by concatenating a std::string with a std::string_view you mean a_string + a_string_view, allowing it would require that a_string_view be converted to std::string via the ctor (10), but that is explicit, so that's not possible. – Sublimity 16/10, 2023 at 6:21

Not really, because converting the string_view to a string would be an unnecessary memory allocation. Concatenating a string and a string_view should not require more than one memory allocation. In addition, if the string is an r-value, then it could be reused, e.g., its storage may already have the needed capacity to hold the result, requiring no memory allocation at all. While it could be implemented this way, I think there was an official reason given why the standard does not support this operation. – Alcahest 16/10, 2023 at 8:10

Furthermore, the intention of such a concatenation may vary. The intended result COULD be a string with its own storage, but it COULD also be a be a view (but not a string_view), which is what std::views::join does. Of course, the result of a concatenation cannot be a string_view (but it can be a view, in the sense of the C++20 ranges library). – Alcahest 16/10, 2023 at 8:15

Somehow I forgot about this conversation. @FlorianWinter, when I wrote would require that a_string_view be converted to std::string I was referring to the fact std::string_view has no operator+, and std::string's operator+ overloads all take std::string/char/char const*, and std::string_view is convertible to std::string, not to char const*. So I guess your curiosity is why doesn't operator+ have an overload for (std::string, std::string_view) and (std::string_view, std::string)? – Sublimity 3/1 at 17:46

My two comments were a combination of: (1) Concatenation of any number of strings into a new owning container (regardless of representation of the original strings) should never require more than one memory allocation and (2) The intention of a + operator that concatenates views and owning containers could be unclear, which may be one reason why such an operator does not (and should not?) exist. My previous comments are more detailed. These are just statements/comments. I have no further questions :). – Alcahest 9/1 at 8:17

Another way to see this: string_view is a contiguous range of characters. It is therefore inherently not closed under concatenation (but there are functions in the ranges library to concatenate views and get a non-contiguous range). basic_string, on the other hand, IS in principle closed under concatenation. It COULD be a convention in the standard that any legal concatenation of any range of characters with a basic_string would return a basic_string and perform no more than one memory allocation (and reuse the storage of an R-value basic_string argument if possible). – Alcahest 9/1 at 8:24

I

6

std::string_view does not own any data, it is only a view. If you want to join two views to get a joined view, you can use boost::join() from the Boost library. But result type will be not a std::string_view.

#include <iostream>
#include <string_view>
#include <boost/range.hpp>
#include <boost/range/join.hpp>

void test()
{
    std::string_view s1{"hello, "}, s2{"world"};
    auto joined = boost::join(s1, s2);

    // print joined string
    std::copy(joined.begin(), joined.end(), std::ostream_iterator(std::cout, ""));
    std::cout << std::endl;

    // other method to print
    for (auto c : joined) std::cout << c;
    std::cout << std::endl;
}

C++23 has joined ranges in the standard library with the name of std::ranges::views::join_with_view

#include <iostream>
#include <ranges>
#include <string_view>

void test()
{
    std::string_view s1{"hello, "}, s2{"world"};
    auto joined = std::ranges::views::join_with_view(s1, s2);

    for (auto c : joined) std::cout << c;
    std::cout << std::endl;
}

Interdictory answered 25/8, 2022 at 8:23 Comment(2)

That C++23 code is incorrect, as join_with_view is in the std::ranges namespace, not std::ranges::views. I get a constraint failure when I fixed that. In any case, we don't want "world" between each character of "hello, ". Luckily, there's a simpler C++20 version: std::views::join(std::array{s1, s2}). – Sauers 3/8, 2023 at 7:46

I've added my own answer with verified working code. – Sauers 3/8, 2023 at 8:13

S

5

A std::string_view is a lightweight, non-owning view of the characters.

To get a view that concatenates multiple string views, we can use the join view adapter that was introduced in C++20:

    auto const joined = std::views::join(std::array{s1, s2});

This gives us a view object that can be iterated over using standard algorithms or range-based for. It can be converted to a std::string object (but not directly to a std::string_view as that requires us to copy the contents somewhere to make them contiguous).

Full demo:

#include <algorithm>
#include <array>
#include <ranges>
#include <string_view>

int main()
{
    const std::string_view s1{"con"};
    const std::string_view s2{"cate"};
    const std::string_view s3{"nate"};

    return !std::ranges::equal(std::views::join(std::array{s1, s2, s3}),
                               std::string_view{"concatenate"});
}

Sauers answered 3/8, 2023 at 8:10 Comment(0)

Recommended topics

Hot tags