elegant way to remove all elements of a vector that are contained in another vector?
Asked Answered
B

5

13

While looking over some code I found loopy and algorithmically slow implementation of std::set_difference :

 for(int i = 0; i < a.size(); i++)
 {
  iter = std::find(b.begin(),b.end(),a[i]);
  if(iter != b.end())
  {
     b.erase(iter);
  }
 }

It can be easily replaced with sort(vectors are not sorted) + set_difference, but that requires allocation of new memory(see my recent Q Can output of set difference be stored in first input? why it cant be done "inplace").
So my solution would be something like:

sort(a.begin(), a.end());
for(size_t i = 0; i < b.size(); i++)
{
 if (binary_search(a.begin(), a.end(), b[i]))
 {
     swap(b[i], b[b.size()-1]); //remove current element by swapping with last
     b.pop_back();     // and removing new last by shrinking
 }
}

can it be done more elegantly?
elegant is subjective so within scope of this Q is defined as clearer code(ideally something from STL algorithms but I think it cant be done) but with no memory allocation and no increase in alg complexity.

Babylon answered 17/1, 2014 at 20:25 Comment(8)
Define "elegance." Less code? Less execution time? Less memory?Guerin
"see my recent Q 21191631" — where?Framing
@JohnDibling defined in last paragraphBabylon
@NoSenseEtAl: Thanks, missed that firsr read-through.Guerin
@Babylon In the second example, is b one of the inputs to set_difference? If so, you're modifying the range that has been passed to you, which the caller may not be thrilled about. Of course, you could make a local copy, but now that solution is starting to look less attractive.Minard
@pretorian it is inclass member in real code(and order is not imporant), so not really a big problem, but yes in generic case this is evil, but still it is aster to allocate M elems and do N log M operations than to allocate 0 and do N*M operationsBabylon
@Babylon - maybe it will be faster to just make a list of all the "hits" references, then copy the data to the target vector from that list sequentially and only do a single resize of the source vector instead of for each hit and restore the vector using the hits list to skip the removed values. I think the allocation and reallocation is the bottleneck here, so it might be a good idea to minimize it or even avoid it, after all you are shrinking so as long as you can restore the correct order you don't need to reallocate unless maybe trim in the end.Islek
@ddriver resizes are shrinks, so they dont cause reallocationsBabylon
F
10

This one does it in O(N+M), assuming both arrays are sorted.

  auto ib = std::begin(two);
  auto iter = std::remove_if (
       std::begin(one), std::end(one),
       [&ib](int x) -> bool {
                       while  (ib != std::end(two) && *ib < x) ++ib;
                       return (ib != std::end(two) && *ib == x);
                     });
Framing answered 17/1, 2014 at 21:9 Comment(5)
I think this requires the enhanced return type deduction of C++1y.Visitant
@Visitant why? Can't see it (and it works in both gcc and clang with -std=c++11).Framing
IIRC both do implement the return type deduction of C++1y as an extension in their C++11 modes. However, [expr.prim.lambda]/4 states that your lambda should have return type void.Visitant
i must say i usually hate code that mixes looping inside lambda that is funtion for stl algorithm + capture by reference but your code is quite pretty. :) In my case arrays arent sorted, but change is trivial, just sort both of them... again nice code. :)Babylon
In my opinion it will not compile, need to add [&ib, &two]Boutonniere
K
9

Sort b so you can binary search it in order to reduce time complexity. Then use the erase-remove idiom in order to throw away all elements from a that are contained in b:

sort( begin(b), end(b) );
a.erase( remove_if( begin(a),end(a),
    [&](auto x){return binary_search(begin(b),end(b),x);}), end(a) );

Of course, you can still sacrifice time complexity for simplicity and reduce your code by removing the sort() and replacing binary_search() by find():

a.erase( remove_if( begin(a),end(a),
    [&](auto x){return find(begin(b),end(b),x)!=end(b);}), end(a) );

This is a matter of taste. In both cases you don't need heap allocations. By the way, I'm using lambda auto parameters which are C++14. Some compilers already implement that feature such as clang. If you don't have such a compiler, but only C++11 then replace auto by the element type of the container.

By the way, this code does not mention any types! You can write a template function so it works for all kind of types. The first variant requires random access iteration of b while the second piece of code does not require that.

Kusin answered 18/1, 2014 at 0:13 Comment(3)
i would prefer &b in that capture list, but that is my personal pref. +1Babylon
@Babylon For maintenance it most often best practice to write just [&] or [=] because you don't have to change that part, if other code changes. Explicit captures make a lot of sense if there's a particular reason that using other variables should not be allowed and hence should result in helpful compile time errors. The case that you specify a default capture and add some variables to be captured differently is also an important use case.Kusin
In terms of time complexity, is this the most optimal way? If a has many elements and b has few elements, isn't it better to sort a copy of a and binary search it? (tag the indices of the elements of the copied a to each element before sorting). Just my brief thought, so any advice would be helpful!Canticle
R
4

One solution that comes to mind is combining remove_if and binary_search. It's effectively the same as your manual looping solution but might be a bit more "elegant" as it uses more STL features.

sort(begin(b), end(b));
auto iter = remove_if(begin(a), end(a), 
                      [](auto x) { 
                          return binary_search(begin(b), end(b), x); 
                      });
// Now [begin(a), iter) defines a new range, and you can erase them however
// you see fit, based on the type of a.
Reglet answered 17/1, 2014 at 20:33 Comment(5)
erase remove strikes again... i hate it from readability point of view, but still a nice ABabylon
I agree, but considering that different containers may have different specializations for removing elements, it's a fair trade-off, imo.Reglet
The remove_if and binary_search are operating over different ranges.Reglet
@Minard remove if is touching a, not b, if i understand it correctlyBabylon
You understand correctly. binary_search only looks at the elements: it is a non-modifying algorithm, and remove_if never touches b at all.Reglet
M
1

The current code is quite clear, in that it should be obvious to any programmer what's going on.

The current performance is O(a.size() * b.size()), which may be pretty bad depending upon the actual sizes.

A more concise and STL-like way to describe it is to use remove_if with a predicate that tells you if a value in in a.

b.erase(std::remove_if(b.begin(), b.end(), [](const auto&x) {
  return std::find(a.begin(), a.end(), x) != a.end();
}), b.end());

(Not tested, so I might have made a syntax error.) I used a lambda, but you can create a functor if you're not using a C++11 compiler.

Note that the original code removes just one instance of a value in b that's also in a. My solution will remove all instances of such a value from b.

Note that the find operation happens again and again, so it's probably better to do that on the smaller vector for better locality of reference.

Magog answered 17/1, 2014 at 21:0 Comment(0)
B
1

after thinking for a while I thought of this
(note:by answering my own Q im not claiming this is a superior to offered A):

vector<int64_t> a{3,2,7,5,11,13}, b{2,3,13,5};
set<int64_t> bs(b.begin(), b.end());
for (const auto& num: bs)
    cout << num << " ";
cout  << endl;
for (const auto& num: a)
    bs.erase(num);
vector<int64_t> result(bs.begin(), bs.end());
for (const auto& num: result)
    cout << num << " ";
Babylon answered 20/1, 2014 at 14:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.