What is the difference between the split_view and the lazy_split_view in C++?
Asked Answered
S

1

10

I have read the latest draft where lazy_split_view is added.

But later on, I realized that split_view was renamed into lazy_split_view, and the split_view was renewed.

libstdc++ also recently implemented this by using GCC Trunk version https://godbolt.org/z/9qG5T9n5h

I have a simple naive program here that shows the usage of two views, but I can't see their differences:

#include <iostream>
#include <ranges>

int main(){

    std::string str { "one two three  four" };

    for (auto word : str | std::views::split(' ')) {
        for (char ch : word)
            std::cout << ch;
        std::cout << '.';
    }

    std::cout << '\n';

    for (auto word : str | std::views::lazy_split(' ')) {
        for (char ch : word)
            std::cout << ch;
        std::cout << '.';
    }

}

Output:

one.two.three..four.
one.two.three..four.

until I've noticed the differences when using as std::span<const char> for both views.

In the first one: std::views::split:

for (std::span<const char> word : str | std::views::split(' '))

the compiler accepts my code.

While in the second one: std::views::lazy_split

for (std::span<const char> word : str | std::views::lazy_split(' ')) 

throws compilation errors.

I know there will be differences between these two, but I can't easily spot them. Is this a defect report in C++20 or a new feature in C++23 (with changes), or both?

Schedule answered 21/6, 2021 at 12:2 Comment(1)
In general lazy evaluation happens on demand as you need it. In other words, lazy computes only when you read the value, in contrast to when you create that value.Leisure
L
12

I've looked at the relevant paper (P2210R2 from Barry Revzin) and split_view has been renamed to lazy_split_view. The new split_view is different in that it provides you with a different result type that preserves the category of the source range.

For example, our string str is a contiguous range, so split will yield a contiguous subrange. Previously it would only give you a forward range. This can be bad if you try to do multi-pass operations or get the address to the underlying storage.

From the example of the paper:

std::string str = "1.2.3.4";
auto ints = str 
    | std::views::split('.')
    | std::views::transform([](auto v){
        int i = 0;
        std::from_chars(v.data(), v.data() + v.size(), i);
        return i;
    });

will work now, but

std::string str = "1.2.3.4";
auto ints = str 
    | std::views::lazy_split('.')
    | std::views::transform([](auto v){
        int i = 0;
        // v.data() doesn't exist
        std::from_chars(v.data(), v.data() + v.size(), i);
        return i;
    });

won't because the range v is only a forward range, which doesn't provide a data() member.

Original Answer

I was under the impression that split must be lazy as well (laziness was one of the selling points of the ranges proposal after all), so I made a little experiment:

struct CallCount{
    int i = 0;

    auto operator()(auto c) {
        i++;
        return c;
    }

    ~CallCount(){
        if (i > 0) // there are a lot of copies made when the range is constructed
            std::cout << "number of calls: " << i << "\n";
    }
};


int main() {
    
    std::string str = "1 3 5 7 9 1";

    std::cout << "split_view:\n";

    for (auto word : str | std::views::transform(CallCount{}) | std::views::split(' ') | std::views::take(2)) {
    }

    std::cout << "lazy_split_view:\n";

    for (auto word : str | std::views::transform(CallCount{}) | std::views::lazy_split(' ') | std::views::take(2)) {
    }    
}

This code prints (note that the transform operates on each char in the string):

split_view:
number of calls: 6
lazy_split_view:
number of calls: 4

So what happens?

Indeed, both views are lazy. But there are differences in their laziness. The transform that I put in front of split just counts how many times it has been called. As it turns out split computes the next item eagerly, while lazy_split stops as soon as it hits the whitespace after the current item.

You can see that the string str consists of numbers that also mark their char index (starting at 1). The take(2) should stop the loop after we've seen '3' in str. And indeed lazy_split stops at the whitespace after '3', but split stops at the whitespace after '5'.

This esentially means that split fetches its next item eagerly instead of lazy. This difference probably shouldn't matter most of the time but it can impact performance critical code.

I don't know whether that was the reason for this change (I haven't read the paper).

Leisure answered 21/6, 2021 at 13:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.