Erasing multiple objects from a std::vector?
Asked Answered
B

9

30

Here is my issue, lets say I have a std::vector with ints in it.

let's say it has 50,90,40,90,80,60,80.

I know I need to remove the second, fifth and third elements. I don't necessarily always know the order of elements to remove, nor how many. The issue is by erasing an element, this changes the index of the other elements. Therefore, how could I erase these and compensate for the index change. (sorting then linearly erasing with an offset is not an option)

Thanks

Brower answered 15/8, 2010 at 14:13 Comment(1)
"sorting then linearly erasing with an offset is not an option": Why?Tolerable
C
43

I am offering several methods:

1. A fast method that does not retain the original order of the elements:

Assign the current last element of the vector to the element to erase, then erase the last element. This will avoid big moves and all indexes except the last will remain constant. If you start erasing from the back, all precomputed indexes will be correct.

void quickDelete( int idx )
{
  vec[idx] = vec.back();
  vec.pop_back();
}

I see this essentially is a hand-coded version of the erase-remove idiom pointed out by Klaim ...

2. A slower method that retains the original order of the elements:

Step 1: Mark all vector elements to be deleted, i.e. with a special value. This has O(|indexes to delete|).

Step 2: Erase all marked elements using v.erase( remove (v.begin(), v.end(), special_value), v.end() );. This has O(|vector v|).

The total run time is thus O(|vector v|), assuming the index list is shorter than the vector.

3. Another slower method that retains the original order of the elements:

Use a predicate and remove if as described in https://mcmap.net/q/467125/-erasing-multiple-objects-from-a-std-vector . To make this efficient and respecting the requirement of not "sorting then linearly erasing with an offset", my idea is to implement the predicate using a hash table and adjust the indexes stored in the hash table as the deletion proceeds on returning true, as Klaim suggested.

Ciscaucasia answered 15/8, 2010 at 14:17 Comment(14)
What if the last element is one of those scheduled for deletion?Weepy
Then swapping with itself is a no-op and pop_back still does the right thing.Clerical
The order of the indices isn't known according to the OPWeepy
This will reorder the vector, which might not be what you want.Coleridge
HUH? Say that given a vector of 10 items, and I want to delete slot 5: I now Assign slot 5 with the value in slot 10, and delete slot 10? How is that supposed to be the correct answer? It's not.Anhinga
@C Johnson: this is the correct answer if the order of the remaining elements doesn't matter. If it does matter, then you need more work to retain the order.Coleridge
@AndreasBrinck The issue due to which this could cause significant trouble is that some may attempt to use this with an index iterator. If you "erase" the last element, you can't increment your index iterator that cycle and instead need to repeat the process with the same index.Ophicleide
The method 1 in this answer has a bug!! (When the last item is marked to be deleted, it will be switched to the previous element's position, and then the total length of vector is shortened). For example, you have a vector 0,3,2,4,1,5,6, you want to delete 1,3,5 (indecies) element (which would be 3,4,5), however, the bug occurs that "5" will not be deleted!Omer
@Omer Deleting the last element using method 1 works too. This was discussed in the first four comments, please have a look.Ciscaucasia
@PeterG. It does not work in my example though. Please work through my example. Number 4 would be scheduled for deletion before the last element at that time, which is number 5. Therefore 5 is switched to the position of 4, before it gets scheduled for deletion.Omer
@Omer When deleting multiple elements by index, the indices to be deleted have to be processed in descending order so that indices that are yet to be deleted are still valid. This was meant by "If you start erasing from the back, all precomputed indexes will be correct." in the description of method 1.Ciscaucasia
Is there detailed performance data on remove_if vs. a "swap + pop"? In my limited testing (GCC -02) I only saw ~1-3% difference in performance - but perhaps I made some kind of mistake.Chasse
@TylerShellberg I did a benchmark just now, using Hayai and a 21 length vector, media performance: swap+pop: 1397k it/s, erase+remove: 912k it/s (I mark the values to be deleted with std::numeric_limits<int>::lowest()), erase+remove_if: 836k it/s.Bask
Note that I only deleted 3 items in the benchmark, if I delete 9, then the performance is a bit closer: 1079k vs 777k vs 630k. So, 72% the speed with remove and it keeps the order, and the deletion order doesn't matter.Bask
S
15

Using a predicate and the algorithm remove_if you can achieve what you want : see http://www.cplusplus.com/reference/algorithm/remove_if/

Don't forget to erase the item (see remove-erase idiom).

Your predicate will simply hold the idx of each value to remove and decrease all indexes it keeps each time it returns true.

That said if you can afford just removing each object using the remove-erase idiom, just make your life simple by doing it.

Shelled answered 15/8, 2010 at 14:18 Comment(0)
C
11

Erase the items backwards. In other words erase the highest index first, then next highest etc. You won't invalidate any previous iterators or indexes so you can just use the obvious approach of multiple erase calls.

Clerical answered 15/8, 2010 at 14:21 Comment(2)
I wrote though: (sorting then linearly erasing with an offset is not an option)Brower
@Milo: unless there's a good reason to arbitrarily reject one of the better solutions, then it certainly is an option. Why can't you sort the indexes?Coleridge
W
6

I would move the elements which you don't want to erase to a temporary vector and then replace the original vector with this.

Weepy answered 15/8, 2010 at 14:20 Comment(1)
I don't think that is any faster than the remove-if idiom which moves retained elements to the front is a single pass and then truncates the vector. No extra memory allocation needed -- everything is done in a single pass.Kwasi
A
6

While this answer by Peter G. in variant one (the swap-and-pop technique) is the fastest when you do not need to preserve the order, here is the unmentioned alternative which maintains the order.

With C++17 and C++20 the removal of multiple elements from a vector is possible with standard algorithms. The run time is O(N * Log(N)) due to std::stable_partition. There are no external helper arrays, no excessive copying, everything is done inplace. Code is a "one-liner":

template <class T>
inline void erase_selected(std::vector<T>& v, const std::vector<int>& selection)
{
    v.resize(std::distance(
        v.begin(),
        std::stable_partition(v.begin(), v.end(),
             [&selection, &v](const T& item) {
                  return !std::binary_search(
                      selection.begin(),
                      selection.end(),
                      static_cast<int>(static_cast<const T*>(&item) - &v[0]));
        })));
}

The code above assumes that selection vector is sorted (if it is not the case, std::sort over it does the job, obviously).

To break this down, let us declare a number of temporaries:

// We need an explicit item index of an element
// to see if it should be in the output or not
int itemIndex = 0;
// The checker lambda returns `true` if the element is in `selection`
auto filter = [&itemIndex, &sorted_sel](const T& item) {
    return !std::binary_search(
                      selection.begin(),
                      selection.end(),
                      itemIndex++);
};

This checker lambda is then fed to std::stable_partition algorithm which is guaranteed to call this lambda only once for each element in the original (unpermuted !) array v.

auto end_of_selected = std::stable_partition(
                           v.begin(),
                           v.end(),
                           filter);

The end_of_selected iterator points right after the last element which should remain in the output array, so we now can resize v down. To calculate the number of elements we use the std::distance to get size_t from two iterators.

v.resize(std::distance(v.begin(), end_of_selected));

This is different from the code at the top (it uses itemIndex to keep track of the array element). To get rid of the itemIndex, we capture the reference to source array v and use pointer arithmetic to calculate itemIndex internally.

Over the years (on this and other similar sites) multiple solutions have been proposed, but usually they employ multiple "raw loops" with conditions and some erase/insert/push_back calls. The idea behind stable_partition is explained beautifully in this talk by Sean Parent.

This link provides a similar solution (and it does not assume that selection is sorted - std::find_if instead of std::binary_search is used), but it also employs a helper (incremented) variable which disables the possibility to parallelize processing on larger arrays.

Starting from C++17, there is a new first argument to std::stable_partition (the ExecutionPolicy) which allows auto-parallelization of the algorithm, further reducing the run-time for big arrays. To make yourself believe this parallelization actually works, there is another talk by Hartmut Kaiser explaining the internals.

Arachnoid answered 6/10, 2020 at 8:37 Comment(0)
T
1

Would this work:

void DeleteAll(vector<int>& data, const vector<int>& deleteIndices)
{
    vector<bool> markedElements(data.size(), false);
    vector<int> tempBuffer;
    tempBuffer.reserve(data.size()-deleteIndices.size());

    for (vector<int>::const_iterator itDel = deleteIndices.begin(); itDel != deleteIndices.end(); itDel++)
        markedElements[*itDel] = true;

    for (size_t i=0; i<data.size(); i++)
    {
        if (!markedElements[i])
            tempBuffer.push_back(data[i]);
    }
    data = tempBuffer;
}

It's an O(n) operation, no matter how many elements you delete. You could gain some efficiency by reordering the vector inline (but I think this way it's more readable).

Tolerable answered 15/8, 2010 at 14:38 Comment(0)
A
1

This is non-trival because as you delete elements from the vector, the indexes change.

[0] hi
[1] you
[2] foo

>> delete [1]
[0] hi
[1] foo

If you keep a counter of times you delete an element and if you have a list of indexes you want to delete in sorted order then:

int counter = 0;
for (int k : IndexesToDelete) {
  events.erase(events.begin()+ k + counter);
  counter -= 1;
}
Anode answered 26/2, 2021 at 0:9 Comment(0)
G
0

You can use this method, if the order of the remaining elements doesn't matter

#include <iostream> 
#include <vector>

using namespace std;
int main()
{
    vector< int> vec;
    vec.push_back(1);
    vec.push_back(-6);
    vec.push_back(3);
    vec.push_back(4);
    vec.push_back(7);
    vec.push_back(9);
    vec.push_back(14);
    vec.push_back(25);
    cout << "The elements befor " << endl;
    for(int i = 0; i < vec.size(); i++) cout << vec[i] <<endl;
    vector< bool> toDeleted;
    int YesOrNo = 0;
    for(int i = 0; i<vec.size(); i++)
    {
        
        cout<<"You need to delete this element? "<<vec[i]<<", if yes enter 1 else enter 0"<<endl;
        cin>>YesOrNo;
        if(YesOrNo)
            toDeleted.push_back(true);
        else
            toDeleted.push_back(false);
    }
    //Deleting, beginning from the last element to the first one
    for(int i = toDeleted.size()-1; i>=0; i--)
    {
        if(toDeleted[i])
        {
            vec[i] = vec.back();
            vec.pop_back();
        }
    }
    cout << "The elements after" << endl;
    for(int i = 0; i < vec.size(); i++) cout << vec[i] <<endl;
    return 0;
}
Gabardine answered 28/9, 2020 at 18:24 Comment(0)
B
0

Here's an elegant solution in case you want to preserve the indices, the idea is to replace the values you want to delete with a special value that is guaranteed not be used anywhere, and then at the very end, you perform the erase itself:

std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9};

// marking 3 elements to be deleted
vec[2] = std::numeric_limits<int>::lowest();
vec[5] = std::numeric_limits<int>::lowest();
vec[3] = std::numeric_limits<int>::lowest();

// erase
vec.erase(std::remove(vec.begin(), vec.end(), std::numeric_limits<int>::lowest()), vec.end());

// print values => 1 2 5 7 8 9
for (const auto& value : vec) std::cout << ' ' << value;
std::cout << std::endl;

It's very quick if you delete a lot of elements because the deletion itself is happening only once. Items can also be deleted in any order that way.

If you use a a struct instead of an int, then you can still mark an element of that struct, for ex dead=true and then use remove_if instead of remove =>

struct MyObj
{
    int  x;
    bool dead = false;
};

std::vector<MyObj> objs = {{1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9}};
objs[2].dead = true;
objs[5].dead = true;
objs[3].dead = true;
objs.erase(std::remove_if(objs.begin(), objs.end(), [](const MyObj& obj) { return obj.dead; }), objs.end());

// print values => 1 2 5 7 8 9
for (const auto& obj : objs) std::cout << ' ' << obj.x;
std::cout << std::endl;

This one is a bit slower, around 80% the speed of the remove.

Bask answered 8/8, 2022 at 16:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.