Is there an efficient way to slice a C++ vector given a vector containing the indexes to be sliced
Asked Answered
G

4

5

I am working to implement a code which was written in MATLAB into C++.

In MATLAB you can slice an Array with another array, like A(B), which results in a new array of the elements of A at the indexes specified by the values of the element in B.

I would like to do a similar thing in C++ using vectors. These vectors are of size 10000-40000 elements of type double.

I want to be able to slice these vectors using another vector of type int containing the indexes to be sliced.

For example, I have a vector v = <1.0, 3.0, 5.0, 2.0, 8.0> and a vector w = <0, 3, 2>. I want to slice v using w such that the outcome of the slice is a new vector (since the old vector must remain unchanged) x = <1.0, 2.0, 5.0>.

I came up with a function to do this:

template<typename T>
std::vector<T> slice(std::vector<T>& v, std::vector<int>& id) {

    std::vector<T> tmp;
    tmp.reserve(id.size());

    for (auto& i : id) {
        tmp.emplace_back(v[i]);
    }

    return tmp;
}

I was wondering if there was potentially a more efficient way to do such a task. Speed is the key here since this slice function will be in a for-loop which has approximately 300000 iterations. I heard the boost library might contain some valid solutions, but I have not had experience yet with it.

I used the chrono library to measure the time it takes to call this slice function, where the vector to be sliced was length 37520 and the vector containing the indexes was size 1550. For a single call of this function, the time elapsed = 0.0004284s. However, over ~300000 for-loop iterations, the total elapsed time was 134s.

Any advice would be much appreicated!

Gessner answered 6/5, 2021 at 22:4 Comment(12)
Is returning tmp vector a seg fault? I would have thought you would have to pass by ref or use dynamic storage (maybe wrapped in a smart ptr too).Damp
for (auto i : id) should run faster than for (auto& i : id), since it involves one less dereference per iteration of the innermost loop. Also, declaring your function parameters as const ref might help the compiler optimise your code.Raccoon
@Damp No, returning a temporary or local variable by value is fine. Returning one by reference is not.Raccoon
@PaulSanders Is that because the values in the tmp vector outlive the function? ie. id is being passed by ref?Damp
@Damp I'm not quite sure what you mean by a seg faultGessner
@Damp It's nothing to do with id. It's just that it is legal (and efficient, and common) to return local variables by value.Raccoon
@PaulSanders I changed the function such that it does not have the & within the for-loop, and tried to see the change in performance over 10000 function calls. It seems to have little to no effect on the time unfortunatelyGessner
OK, I posted an answer trying something else.Raccoon
@PaulSanders i think i understand, its making a copy from function scope back to function call scope when passed by value? Can this extra copy be avoided?Damp
@Damp In essence yes, but copy elision avoids the need to actually copy the object being returned.Raccoon
@Gessner For a performance critical loop like this, it may be worth passing in tmp as a mutable std::vector<int>& parameter, eliminating the frequent allocations and deallocations.Silicle
A better name for this function would be “select”. The reason Matlab is more efficient is because it doesn’t actually copy the original array. and equivalent implementation in C++ should do the same.Clew
R
3

emplace_back has some overhead as it involves some internal accounting inside std::vector. Try this instead:

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {

    std::vector<T> tmp;
    tmp.resize (id.size ());

    size_t n = 0;
    for (auto i : id) {
        tmp [n++] = v [i];
    }

    return tmp;
}

Also, I removed an unnecessary dereference in your inner loop.


Edit: I thought about this some more, and inspired by @jack's answer, I think that the inner loop (which is the one that counts) can be optimised further. The idea is to put everything used by the loop in local variables, which gives the compiler the best chance to optimise the code. So try this, see what timings you get. Make sure that you test a Release / optimised build:

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {

    size_t id_size = id.size ();
    std::vector<T> tmp (id_size);
    T *tmp_data = tmp.data ();

    const int *id_data = id.data ();
    const T* v_data = v.data ();

    for (size_t i = 0; i < id_size; ++i) {
        tmp_data [i] = v_data [id_data [i]];
    }

    return tmp;
}
Raccoon answered 6/5, 2021 at 22:29 Comment(10)
I compared 300000 iterations of my old function compared to the one you have suggested, and yours exectues around 5 times faster! So an improvement from ~123s to 24s. Thanks a lot for that. I will wait to see if there are any other answers before I give it a tick, but this has helped tremendouslyGessner
Hmmm, just goes to show, you never can tell.Raccoon
I had always thought that emplace_back was faster than assignment, but this shows that I was pretty incorrect about thatGessner
std::vector<T> tmp(id.size());Prehensible
@JanSchultke Even better.Raccoon
@JanSchultke Is it typically better to allocate the memory via tmp(id.size()) rather than resize?Gessner
@Gessner The difference between default constructing an empty vector then allocating memory versus doing the allocation at time of construction is probably not significant. I do think constructing the vector with the desired size more clearly expresses your intent (and is less typing).Lovable
@Gessner Added some stuff to my answer, see what you think.Raccoon
@PaulSanders Apologies for the late response. Thank you for the continued interest in this question! This all helps a lotGessner
std::vector<T> vec(n); will default construct n instances of T. Fine if an int but could have large overhead for expensive constructorsTenace
D
3

The performance seems a bit slow; are you building with compiler optimizations (eg. g++ main.cpp -O3 or if using an IDE, switching to release mode). This alone sped up computation time around 10x.

If you are using optimizations already, by using basic for loop iteration (for int i = 0; i < id.size(); i++) computation time was sped up around 2-3x on my machine, the idea being, the compiler doesn't have to resolve what type auto refers to, and since basic for loops have been in C++ forever, the compiler is likely to have lots of tricks to speed it up.

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id){

    // @Jan Schultke's suggestion
    std::vector<T> tmp(id.size ());

    size_t n = 0;
    for (int i = 0; i < id.size(); i++) {
        tmp [n++] = v [i];
    }

    return tmp;
}
Damp answered 6/5, 2021 at 22:5 Comment(8)
It is truly bizarre that this has any performance impact at all. auto has nothing to do with it, since auto is statically inferred at compile time. From the looks of it, GCC and clang fail to vectorize the range-based code, probably because range-based loops rely on pointer arithmetic. godbolt.org/z/T3cY8PP83 With MSVC, there shouldn't be any difference since it doesn't bother vectorizing either.Prehensible
Oh true, it is resolved in compile time. Hmm, I'm not sure what to make of this false positive. I tested whether range based for loops were faster than auto for loops because "simple is better than complex", and it worked, so I tried to infer the results (from my limited knowledge of compilers).Damp
Are you (and the OP) compiling with optimisation enabled?Raccoon
@PaulSanders Yes, not sure about OP.Damp
@PaulSanders What do you mean by optimisation enabled..? Is that a setting I have to enable, or a piece of code I have to insert?Gessner
@Gessner Add O3 to your compilation command eg. g++ main.cpp -o main -O3, will generally speed things up quite abit.Damp
@Damp I'm not sure how to do that. I'm using microsoft visual studio. Could you maybe give instructions on how and where to place that command?Gessner
@Gessner I'm not too familiar with visual studio, but it should be builtin when you switch from debug mode to release mode.Damp
R
3

emplace_back has some overhead as it involves some internal accounting inside std::vector. Try this instead:

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {

    std::vector<T> tmp;
    tmp.resize (id.size ());

    size_t n = 0;
    for (auto i : id) {
        tmp [n++] = v [i];
    }

    return tmp;
}

Also, I removed an unnecessary dereference in your inner loop.


Edit: I thought about this some more, and inspired by @jack's answer, I think that the inner loop (which is the one that counts) can be optimised further. The idea is to put everything used by the loop in local variables, which gives the compiler the best chance to optimise the code. So try this, see what timings you get. Make sure that you test a Release / optimised build:

template<typename T>
std::vector<T> slice(const std::vector<T>& v, const std::vector<int>& id) {

    size_t id_size = id.size ();
    std::vector<T> tmp (id_size);
    T *tmp_data = tmp.data ();

    const int *id_data = id.data ();
    const T* v_data = v.data ();

    for (size_t i = 0; i < id_size; ++i) {
        tmp_data [i] = v_data [id_data [i]];
    }

    return tmp;
}
Raccoon answered 6/5, 2021 at 22:29 Comment(10)
I compared 300000 iterations of my old function compared to the one you have suggested, and yours exectues around 5 times faster! So an improvement from ~123s to 24s. Thanks a lot for that. I will wait to see if there are any other answers before I give it a tick, but this has helped tremendouslyGessner
Hmmm, just goes to show, you never can tell.Raccoon
I had always thought that emplace_back was faster than assignment, but this shows that I was pretty incorrect about thatGessner
std::vector<T> tmp(id.size());Prehensible
@JanSchultke Even better.Raccoon
@JanSchultke Is it typically better to allocate the memory via tmp(id.size()) rather than resize?Gessner
@Gessner The difference between default constructing an empty vector then allocating memory versus doing the allocation at time of construction is probably not significant. I do think constructing the vector with the desired size more clearly expresses your intent (and is less typing).Lovable
@Gessner Added some stuff to my answer, see what you think.Raccoon
@PaulSanders Apologies for the late response. Thank you for the continued interest in this question! This all helps a lotGessner
std::vector<T> vec(n); will default construct n instances of T. Fine if an int but could have large overhead for expensive constructorsTenace
T
1

C++ 20 introduces ranges, a very nice set of feature for accessing and performing operations on containers. Using a view means that you don't even have to make a copy if you don't want to. Here's a related post on using ranges.

std::vector<int> v1 = {1, 2, 3, 4, 5, 6, 7};
std::vector<int> id = {2, 3, 5};

// Lambda that grabs an element in v1
auto indexer = [&v1](auto i) { return v1.at(i); };

// Create the view using the indices 
auto v2_view = id | std::views::transform(indexer);

// Make a copy if you want
auto v2 = v2_view | std::ranges::to<std::vector>(); // C++23
// std::vector<int> v2(v2_view.begin(), v2_view.end());

// std::print introduced in C++23
std::print("v1: {}\n", v1);
std::print("v2: {}\n", v2);
std::print("v2_view: {}\n", v2_view);

Some rudimentary timing numbers, 100m elements in v1 and 10m elements in id:

/*[115ms]*/ auto v2 = v2_view | std::ranges::to<std::vector>();
/*[385ms]*/ std::vector<int> v2(v2_view.begin(), v2_view.end());

std::ranges::to appears to be around 3x faster on my configuration, for basic types.
id | std::views::transform(indexer) is O(1).

Tenace answered 26/9, 2024 at 17:22 Comment(1)
this is the only answer that does something comparable to what matlab is likely doing internallyClew
A
0

I would like to do a similar thing in C++ using vectors.

C++'s answer to Matlab's sliceable vectors isn't std::vector; it's std::valarray.

https://en.cppreference.com/w/cpp/numeric/valarray

C++'s std::valarray supports instantiating other std::valarray objects with the help of std::slice.

Once you have your valarray slice object, C++'s std::valarray explicitly supports applying scalar operations to each of its elements, not only with the basic arithmetic operations but also by applying a function to each of it elements by passing it to std::valarray::apply.

Azine answered 17/11, 2023 at 12:31 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.