Is there an efficient way to slice a C++ vector given a vector containing the indexes to be sliced

Asked 6/5, 2021 at 22:4 Answered 26/9, 2024 at 17:22

I am working to implement a code which was written in MATLAB into C++.

In MATLAB you can slice an Array with another array, like A(B), which results in a new array of the elements of A at the indexes specified by the values of the element in B.

I would like to do a similar thing in C++ using vectors. These vectors are of size 10000-40000 elements of type double.

I want to be able to slice these vectors using another vector of type int containing the indexes to be sliced.

For example, I have a vector v = <1.0, 3.0, 5.0, 2.0, 8.0> and a vector w = <0, 3, 2>. I want to slice v using w such that the outcome of the slice is a new vector (since the old vector must remain unchanged) x = <1.0, 2.0, 5.0>.

I came up with a function to do this:

template<typename T>
std::vector<T> slice(std::vector<T>& v, std::vector<int>& id) {

    std::vector<T> tmp;
    tmp.reserve(id.size());

    for (auto& i : id) {
        tmp.emplace_back(v[i]);
    }

    return tmp;
}

I was wondering if there was potentially a more efficient way to do such a task. Speed is the key here since this slice function will be in a for-loop which has approximately 300000 iterations. I heard the boost library might contain some valid solutions, but I have not had experience yet with it.

I used the chrono library to measure the time it takes to call this slice function, where the vector to be sliced was length 37520 and the vector containing the indexes was size 1550. For a single call of this function, the time elapsed = 0.0004284s. However, over ~300000 for-loop iterations, the total elapsed time was 134s.

Any advice would be much appreicated!

Gessner answered 6/5, 2021 at 22:4 Comment(12)

Is returning tmp vector a seg fault? I would have thought you would have to pass by ref or use dynamic storage (maybe wrapped in a smart ptr too). – Damp 6/5, 2021 at 22:13

for (auto i : id) should run faster than for (auto& i : id), since it involves one less dereference per iteration of the innermost loop. Also, declaring your function parameters as const ref might help the compiler optimise your code. – Raccoon 6/5, 2021 at 22:14

@Damp No, returning a temporary or local variable by value is fine. Returning one by reference is not. – Raccoon 6/5, 2021 at 22:14

@PaulSanders Is that because the values in the tmp vector outlive the function? ie. id is being passed by ref? – Damp 6/5, 2021 at 22:20

@Damp I'm not quite sure what you mean by a seg fault – Gessner 6/5, 2021 at 22:21

@Damp It's nothing to do with id. It's just that it is legal (and efficient, and common) to return local variables by value. – Raccoon 6/5, 2021 at 22:22

@PaulSanders I changed the function such that it does not have the & within the for-loop, and tried to see the change in performance over 10000 function calls. It seems to have little to no effect on the time unfortunately – Gessner 6/5, 2021 at 22:23

OK, I posted an answer trying something else. – Raccoon 6/5, 2021 at 22:30

@PaulSanders i think i understand, its making a copy from function scope back to function call scope when passed by value? Can this extra copy be avoided? – Damp 6/5, 2021 at 22:30

@Damp In essence yes, but copy elision avoids the need to actually copy the object being returned. – Raccoon 6/5, 2021 at 22:31

@Gessner For a performance critical loop like this, it may be worth passing in tmp as a mutable std::vector<int>& parameter, eliminating the frequent allocations and deallocations. – Silicle 7/5, 2021 at 2:59

A better name for this function would be “select”. The reason Matlab is more efficient is because it doesn’t actually copy the original array. and equivalent implementation in C++ should do the same. – Clew 26/9, 2024 at 17:25

emplace_back has some overhead as it involves some internal accounting inside std::vector. Try this instead:

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {

    std::vector<T> tmp;
    tmp.resize (id.size ());

    size_t n = 0;
    for (auto i : id) {
        tmp [n++] = v [i];
    }

    return tmp;
}

Also, I removed an unnecessary dereference in your inner loop.

Edit: I thought about this some more, and inspired by @jack's answer, I think that the inner loop (which is the one that counts) can be optimised further. The idea is to put everything used by the loop in local variables, which gives the compiler the best chance to optimise the code. So try this, see what timings you get. Make sure that you test a Release / optimised build:

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {

    size_t id_size = id.size ();
    std::vector<T> tmp (id_size);
    T *tmp_data = tmp.data ();

    const int *id_data = id.data ();
    const T* v_data = v.data ();

    for (size_t i = 0; i < id_size; ++i) {
        tmp_data [i] = v_data [id_data [i]];
    }

    return tmp;
}

Raccoon answered 6/5, 2021 at 22:29 Comment(10)

I compared 300000 iterations of my old function compared to the one you have suggested, and yours exectues around 5 times faster! So an improvement from ~123s to 24s. Thanks a lot for that. I will wait to see if there are any other answers before I give it a tick, but this has helped tremendously – Gessner 6/5, 2021 at 22:50

Hmmm, just goes to show, you never can tell. – Raccoon 6/5, 2021 at 22:52

I had always thought that emplace_back was faster than assignment, but this shows that I was pretty incorrect about that – Gessner 6/5, 2021 at 22:55

std::vector<T> tmp(id.size()); – Prehensible 6/5, 2021 at 23:4

@JanSchultke Even better. – Raccoon 6/5, 2021 at 23:5

@JanSchultke Is it typically better to allocate the memory via tmp(id.size()) rather than resize? – Gessner 6/5, 2021 at 23:48

@Gessner The difference between default constructing an empty vector then allocating memory versus doing the allocation at time of construction is probably not significant. I do think constructing the vector with the desired size more clearly expresses your intent (and is less typing). – Lovable 7/5, 2021 at 2:22

@Gessner Added some stuff to my answer, see what you think. – Raccoon 8/5, 2021 at 22:28

@PaulSanders Apologies for the late response. Thank you for the continued interest in this question! This all helps a lot – Gessner 13/5, 2021 at 22:32

std::vector<T> vec(n); will default construct n instances of T. Fine if an int but could have large overhead for expensive constructors – Tenace 27/9, 2024 at 2:33

The performance seems a bit slow; are you building with compiler optimizations (eg. g++ main.cpp -O3 or if using an IDE, switching to release mode). This alone sped up computation time around 10x.

If you are using optimizations already, by using basic for loop iteration (for int i = 0; i < id.size(); i++) computation time was sped up around 2-3x on my machine, the idea being, the compiler doesn't have to resolve what type auto refers to, and since basic for loops have been in C++ forever, the compiler is likely to have lots of tricks to speed it up.

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id){

    // @Jan Schultke's suggestion
    std::vector<T> tmp(id.size ());

    size_t n = 0;
    for (int i = 0; i < id.size(); i++) {
        tmp [n++] = v [i];
    }

    return tmp;
}

Damp answered 6/5, 2021 at 22:5 Comment(8)

It is truly bizarre that this has any performance impact at all. auto has nothing to do with it, since auto is statically inferred at compile time. From the looks of it, GCC and clang fail to vectorize the range-based code, probably because range-based loops rely on pointer arithmetic. godbolt.org/z/T3cY8PP83 With MSVC, there shouldn't be any difference since it doesn't bother vectorizing either. – Prehensible 7/5, 2021 at 11:2

Oh true, it is resolved in compile time. Hmm, I'm not sure what to make of this false positive. I tested whether range based for loops were faster than auto for loops because "simple is better than complex", and it worked, so I tried to infer the results (from my limited knowledge of compilers). – Damp 7/5, 2021 at 11:21

Are you (and the OP) compiling with optimisation enabled? – Raccoon 7/5, 2021 at 17:30

@PaulSanders Yes, not sure about OP. – Damp 7/5, 2021 at 17:54

@PaulSanders What do you mean by optimisation enabled..? Is that a setting I have to enable, or a piece of code I have to insert? – Gessner 7/5, 2021 at 18:23

@Gessner Add O3 to your compilation command eg. g++ main.cpp -o main -O3, will generally speed things up quite abit. – Damp 7/5, 2021 at 18:31

@Damp I'm not sure how to do that. I'm using microsoft visual studio. Could you maybe give instructions on how and where to place that command? – Gessner 7/5, 2021 at 18:35

@Gessner I'm not too familiar with visual studio, but it should be builtin when you switch from debug mode to release mode. – Damp 7/5, 2021 at 18:37

emplace_back has some overhead as it involves some internal accounting inside std::vector. Try this instead:

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {

    std::vector<T> tmp;
    tmp.resize (id.size ());

    size_t n = 0;
    for (auto i : id) {
        tmp [n++] = v [i];
    }

    return tmp;
}

Also, I removed an unnecessary dereference in your inner loop.

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {

    size_t id_size = id.size ();
    std::vector<T> tmp (id_size);
    T *tmp_data = tmp.data ();

    const int *id_data = id.data ();
    const T* v_data = v.data ();

    for (size_t i = 0; i < id_size; ++i) {
        tmp_data [i] = v_data [id_data [i]];
    }

    return tmp;
}

Raccoon answered 6/5, 2021 at 22:29 Comment(10)

Hmmm, just goes to show, you never can tell. – Raccoon 6/5, 2021 at 22:52

I had always thought that emplace_back was faster than assignment, but this shows that I was pretty incorrect about that – Gessner 6/5, 2021 at 22:55

std::vector<T> tmp(id.size()); – Prehensible 6/5, 2021 at 23:4

@JanSchultke Even better. – Raccoon 6/5, 2021 at 23:5

@JanSchultke Is it typically better to allocate the memory via tmp(id.size()) rather than resize? – Gessner 6/5, 2021 at 23:48

@Gessner Added some stuff to my answer, see what you think. – Raccoon 8/5, 2021 at 22:28

@PaulSanders Apologies for the late response. Thank you for the continued interest in this question! This all helps a lot – Gessner 13/5, 2021 at 22:32

std::vector<T> vec(n); will default construct n instances of T. Fine if an int but could have large overhead for expensive constructors – Tenace 27/9, 2024 at 2:33

C++ 20 introduces ranges, a very nice set of feature for accessing and performing operations on containers. Using a view means that you don't even have to make a copy if you don't want to. Here's a related post on using ranges.

std::vector<int> v1 = {1, 2, 3, 4, 5, 6, 7};
std::vector<int> id = {2, 3, 5};

// Lambda that grabs an element in v1
auto indexer = [&v1](auto i) { return v1.at(i); };

// Create the view using the indices 
auto v2_view = id | std::views::transform(indexer);

// Make a copy if you want
auto v2 = v2_view | std::ranges::to<std::vector>(); // C++23
// std::vector<int> v2(v2_view.begin(), v2_view.end());

// std::print introduced in C++23
std::print("v1: {}\n", v1);
std::print("v2: {}\n", v2);
std::print("v2_view: {}\n", v2_view);

Some rudimentary timing numbers, 100m elements in v1 and 10m elements in id:

/*[115ms]*/ auto v2 = v2_view | std::ranges::to<std::vector>();
/*[385ms]*/ std::vector<int> v2(v2_view.begin(), v2_view.end());

std::ranges::to appears to be around 3x faster on my configuration, for basic types.
id | std::views::transform(indexer) is O(1).

Tenace answered 26/9, 2024 at 17:22 Comment(1)

this is the only answer that does something comparable to what matlab is likely doing internally – Clew 26/9, 2024 at 17:30

I would like to do a similar thing in C++ using vectors.

C++'s answer to Matlab's sliceable vectors isn't std::vector; it's std::valarray.

https://en.cppreference.com/w/cpp/numeric/valarray

C++'s std::valarray supports instantiating other std::valarray objects with the help of std::slice.

Once you have your valarray slice object, C++'s std::valarray explicitly supports applying scalar operations to each of its elements, not only with the basic arithmetic operations but also by applying a function to each of it elements by passing it to std::valarray::apply.

Azine answered 17/11, 2023 at 12:31 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags