How to check whether a vector is a subset of another in c++
Asked Answered
P

4

21

I am trying to find a simple way to check whether a vector is a subset of another without sorting the order of elements in the vector. Both the vectors contain random number elements in them.

std::includes seems to work only for sorted ranges. How can I accomplish this?

Parsimony answered 2/8, 2011 at 1:54 Comment(10)
What if vector #1 has duplicates? Does vector #2 have to have similar duplicates as well?Enlargement
Why do you need to do this? What's so bad about sorting them?Waves
Sounds like you should have picked a std::set in the first place, possibly. Are you sure that std::vector is what you want?Salot
Related: #4068641 (though the only useful answer is about std::includes, about which you are correct: it assumes sorted input).Salot
there are no duplicates on either of the vectorsParsimony
so, what's wrong with sorting?Porphyrin
@Gene: Maybe the order matters?Salot
@Tomalak -- in what sense the order matters?Porphyrin
@Gene: Ask the OP. He hasn't specified whether it does or not; you just jumped to the conclusion that it doesn't.Salot
If order matters, it doesn't make sense to talk about sets or subsets in the first place. Mathematically, a set is defined as order being irrelevant. Otherwise, it would be a tuple. If order matters then this operation isn't really a subset at all, but something else. From what I can infer, it seems like he can't sort it because he can't modify the data be used.Snailpaced
F
36

Copy the vectors. Sort the copies. Then use std::includes on the copies.

template <typename T>
bool IsSubset(std::vector<T> A, std::vector<T> B)
{
    std::sort(A.begin(), A.end());
    std::sort(B.begin(), B.end());
    return std::includes(A.begin(), A.end(), B.begin(), B.end());
}
Fantoccini answered 2/8, 2011 at 2:3 Comment(8)
"I am trying to find a simple way to check whether a vector is a subset of another without sorting the order of elements in the vector." If sorting is the answer, your question was very, very broken.Salot
@Benjamin: Agreed, there's no reason why this should be downvoted. This is the smartest way of doing this out of all the answers given. If the OP truly wants to check if two things are subsets of each other without modifying the data, this is the way to go. This is literally probably about 5 lines of code in all. If the exact requirement was "I need to check if one is a subset of the other, with O(1) memory usage and without displacing the data of either vectors" the answer would be one of the two below.Snailpaced
@Benjamin: Because you didn't answer the question of "find... without sorting..." as Tomalak stated.Enlargement
@arasmussen: It seems like the "without sorting" requirement is a symptom of an underlying issue that the OP never quite got at. He described something he wanted to do but not what he was actually trying to accomplish. I've seen questions (and I've been guilty of asking such things) where the poster asked for one thing thinking it was the solution but where there was a much simpler solution that didn't require strange requirements.Snailpaced
@arasmussen: The question wasn't "find... without sorting...". If it was, I would have voted to close the question because it is an arbitrary requirement that is not applicable to real life programming. The requirement was that the OP didn't want his own vector sorted. That is a requirement that could be applicable to real life programming, so I answered that question.Fantoccini
@Benjamin: If you're going to transparently modify the question that you're answering, at least mention that in your answer as a disclaimer.Salot
@Tomalak: I didn't modify the question, I just read it differently than you.Fantoccini
Agree with Benjamin. There's really only one good reason why you wouldn't want to sort the original vectors: if the original order of the vectors had business relevance. So you sort a copy.Unlearned
S
9

My answer assumes that when you say "subset", you are really searching more for the equivalent of a "substring"; that is, maintaining order of elements during the search.


Ultimately, I can't see how you could do this in anything less than O(n*m). Given that, you can just roll your own pretty simply:

template <typename T1, typename T2>
bool contains(std::vector<T1> const& a, std::vector<T2> const& b) {
   for (typename std::vector<T1>::const_iterator i = a.begin(), y = a.end(); i != y; ++i) {
      bool match = true;

      typename std::vector<T1>::const_iterator ii = i;
      for (typename std::vector<T2>::const_iterator j = b.begin(), z = b.end(); j != z; ++j) {
          if (ii == a.end() || *j != *ii) {
              match = false;
              break;
          }
          ii++;
      }

      if (match)
         return true;
   }

   return false;
}

Live demo.

(You could probably be more creative with the template parameters.)

Salot answered 2/8, 2011 at 2:2 Comment(1)
2011 was a scary time for C++!Phototypography
E
2

This is assuming duplicates do NOT matter. So if you have two instances of the number 99 in vector a, then as long as vector b has at least one instance of the number 99, then it will be declared as a subset.

bool isSubset(const std::vector<int>& a, const std::vector<int>& b)
{
    for (std::vector<int>::const_iterator i = a.begin(); i != a.end(); i++)
    {
        bool found = false;

        for (std::vector<int>::const_iterator j = b.begin(); j != b.end(); j++)
        {
            if (*i == *j)
            {
                found = true;
                break;
            }
        }

        if (!found)
        {
            return false;
        }
    }

    return true;
}
Enlargement answered 2/8, 2011 at 1:59 Comment(15)
so instead of n*log(n) complexity the proposed solution has n^2 complexity (and it's also a bad c++)Porphyrin
He didn't mention complexity, and this is a very simple-to-understand solution in my opinionEnlargement
@Gene: How are you going to get O(n log m)? The input is not sorted. The best you can hope for is amortised O(n*m), I'm sure.Salot
i know i can check for elements of one vector on the other and tell if it is a subset. I was wondering if there was another method.Parsimony
Why do you want another method? You don't want to sort it, you don't want to iterate through each of them... Is there a larger problem that you aren't telling us?Enlargement
@Tomalak: sorting has n*log(n) complexity, there isn't faster way in general.Porphyrin
With sorting it would just be O(n log n), where n is the larger size. Comparison would just be something like O(m) (I think) so it's still O(n log n).Snailpaced
no there is no larger problem. i was just thinking if the others had a simple solution as in using std::includes but modifying it so that it works for unsorted elements as wellParsimony
@Gene, why do you call that bad C++... Did you want me to use indices instead of iterators?Enlargement
@Gene: Who's sorting? This question is about searching in an unsorted range.Salot
@Tomalak -- and I'm asking again, what's wrong with sorting them, that's the best way to solve the problemPorphyrin
@Gene, Such as? I'm asking as a student.Enlargement
@arasmussen -- unnecessary copies, uninitialized variables, with unnecessary long life-time, generally c-style codePorphyrin
@arasmussen: Accept the vectors by const reference, and declare your iterators within the loops are the major offenses.Unhealthy
Have the above modifications fixed all "bad" C++ offenses? Thanks for your input.Enlargement
E
0

With no sorting...

template <typename T>
bool isSubsetOrEqual(std::vector<T> const& a, std::vector<T> const& b) {
   for(auto const& av:a){
      if(std::find(b.begin(),b.end(),av)==b.end())
          return false;
   }
   return true;
}
Expedient answered 20/12, 2017 at 22:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.