Check std::vector has duplicates
Asked Answered
O

8

26

I want to check if a vector of integers has any duplicates or not, and have to return true if it does. So I try to do something like this:

vector<int> uGuess = {1,2,3,3,4,5}
vector<int> a = uGuess;
sort(a.begin(), a.end());
bool d = unique(a.begin(), a.end());

And this will not work since unqiue cannot be assigned as a bool value. How should I proceed towards this? If I were to write a for loop to perform the same action, how should I do that?

Olnek answered 28/9, 2017 at 20:27 Comment(10)
unique doesn't do what you are describing you want to doManageable
auto it = std::unique(a.begin(), a.end()); bool b = it == a.end();Hermanhermann
Check a reference.Provenance
@Hermanhermann doesn't work on non-sorted vectorsManageable
@patatahooligan: In the example it is sorted as initialized, then it gets sorted. I think it's safe to say it's sorted.Materiel
@FredLarson True, but OP's statement of the problem does not mention that the array is sorted. OP should clarify.Manageable
@Manageable It doesn't matter, he calls sort() in his code, so then it's sorted.Pion
I kid you not, I somehow skipped over that line before. In that case @Hermanhermann 's comment could be the accepted answer.Manageable
"How should I proceed towards this" depends on what this is. Don't post code that doesn't work and ask people to guess at what it's intended to do; say it. What should the value in d mean?Disinclination
Seems a duplicate of Determining if an unordered vector<T> has all unique elements and Determine if there are duplicates in vector.Butterfly
W
16

Looking at google for std::unique I found this page std::unique. I looked at what it did:

Eliminates all except the first element from every consecutive group of equivalent elements from the range [first, last)

So it looks like it does what you want - removes the duplicates.

I then looked at what it returns...

... returns a past-the-end iterator for the new logical end of the range

So the result from std::unique is a sequence which is not necessary the same as the whole vector.

If nothing was removed, the return value would be the end of the vector.

So you want:

vector<int>::iterator it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Or for C++11:

auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Finally for the unique function to work, the vector needs to be sorted, so the complete code would be:

sort(a.begin(), a.end());
auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());
Winner answered 28/9, 2017 at 20:37 Comment(4)
"So it looks like it does what you want - removes the duplicates." What?! Where in the world does the OP ask for removing duplicates? This answer is wrong as std::unique modifies the vector.Artistry
@Artistry Modifying a duplicate vector seems fine – QA tried that her-/himself, too, but didn't manage to apply the correct checks afterwards...Sobersided
@Sobersided My comment was about removing the duplicates. That was not what the OP asked for.Artistry
@Artistry Not directly – but the initial idea has been since right from the start removing duplicates from a copy and then compare sizes – if differing, removal took place thus duplicates are still available in original unchanged vector – solely the appropriate test for removal detection has been lacking...Sobersided
G
33

The algorithm you're looking for is std::adjacent_find.

// The container must be sorted!
const std::vector<int> sortedVector = {1,2,3,3,4,5};
const bool hasDuplicates = std::adjacent_find(sortedVector.begin(), sortedVector.end()) != sortedVector.end();

Unlike std::unique, std::adjacent_find doesn't modify the container.

As a bonus, std::adjacent_find returns an iterator to the first element in the duplicate "pair":

const auto duplicate = std::adjacent_find(sortedVector.begin(), sortedVector.end());

if (duplicate != sortedVector.end())
    std::cout << "Duplicate element = " << *duplicate << "\n";
Guarded answered 3/1, 2019 at 18:51 Comment(1)
Only works for sorted containersAntechoir
A
23

You should using set

set<int> s(a.begin(), a.end());
return s.size() != a.size();
Airscrew answered 17/2, 2019 at 12:26 Comment(4)
You mean return s.size()==a.size();, right? I like this approach because I do something similar in python, but do you have any idea how fast this is vs sorting?Arron
@Arron This should be as fast as sorting but if you use std::unordered_set in place of std::set it can be faster as it is a hash set.Papagena
why does it not have O(n) complexity where n = a.size() ?Chiller
@ShubhamGarg Late reply for late readers: Insertion complexity for std::set is O(log(s)) with s being current size of the set – there are n elements, and O(sum(i = 0 -> n) { log(i) }) (is that understandable?) remains O(n*log(n))...Sobersided
W
16

Looking at google for std::unique I found this page std::unique. I looked at what it did:

Eliminates all except the first element from every consecutive group of equivalent elements from the range [first, last)

So it looks like it does what you want - removes the duplicates.

I then looked at what it returns...

... returns a past-the-end iterator for the new logical end of the range

So the result from std::unique is a sequence which is not necessary the same as the whole vector.

If nothing was removed, the return value would be the end of the vector.

So you want:

vector<int>::iterator it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Or for C++11:

auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Finally for the unique function to work, the vector needs to be sorted, so the complete code would be:

sort(a.begin(), a.end());
auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());
Winner answered 28/9, 2017 at 20:37 Comment(4)
"So it looks like it does what you want - removes the duplicates." What?! Where in the world does the OP ask for removing duplicates? This answer is wrong as std::unique modifies the vector.Artistry
@Artistry Modifying a duplicate vector seems fine – QA tried that her-/himself, too, but didn't manage to apply the correct checks afterwards...Sobersided
@Sobersided My comment was about removing the duplicates. That was not what the OP asked for.Artistry
@Artistry Not directly – but the initial idea has been since right from the start removing duplicates from a copy and then compare sizes – if differing, removal took place thus duplicates are still available in original unchanged vector – solely the appropriate test for removal detection has been lacking...Sobersided
B
4

If someone is forced to write own algorithm:

bool hasDuplicates(const std::vector<int>& arr) {
    for (std::size_t i = 0; i < arr.size(); ++i) {
        for (std::size_t j = i + 1; j < arr.size(); ++j) {
            if (arr[i] == arr[j])
                return true;
        }
    }
    return false;
}

But in real code you should use things that already exist, and in the standard library.

Binnie answered 28/9, 2017 at 21:12 Comment(2)
That leaves the question: what function in the STL is the best fit for this problem?Glimmering
I believe this will have bad performance for large vectors due to caches. If you use for (std::size_t j = 0; j < i; ++j) you test the small ranges first, which fits in cache, and only at the end do slow searches outside of cache. It's likely to terminate early if there is a duplicate and never reach the slow part.Hrutkay
A
2

Sort the vector if it's not already sorted, and then use std::unique(), like this:

#include <iostream>
#include <vector>
#include <algorithm>

int main() {
    std::vector<int> v = {3, 1, 3, 4, 5};
    sort(v.begin(), v.end());
    auto it = std::unique(v.begin(), v.end());
    std::cout << ((it == v.end()) ? "Unique\n" : "Duplicate(s)\n");
    return 0;
}

Output:

Duplicate(s)

Andersonandert answered 28/9, 2017 at 21:18 Comment(3)
Here, the original vector gets changed. So, this can't be the useful method.Excitability
@Andersonandert The input vector has to be sorted. Am I right?Jessalyn
Yes the vector needs to be sorted, answer updated! Thanks @Jessalyn for the comment and the upvote!Andersonandert
S
2

So far all these solutions either modify the container or have O(n²) complexity. You can put a std::map to much better use:

#include <algorithm>
#include <iterator>
#include <map>

template <typename Iterator>
bool has_duplicates( Iterator first, Iterator last )
{
  std::map <typename std::iterator_traits <Iterator> ::value_type, std::size_t> histogram;

  while (first != last)
    if (++histogram[ *first++ ] > 1) 
      return true;

  return false;
}

#include <iostream>
#include <vector>

int main()
{
  using std::begin;
  using std::end;

  int a[] = { 2, 3, 5, 7, 11 };
  int b[] = { 2, 3, 5, 5, 7 };

  std::vector <int> c( begin(a), end(a) );
  std::vector <int> d( begin(b), end(b) );

  std::cout << std::boolalpha;
  std::cout << "a has duplicates false : " << has_duplicates( begin(a), end(a) ) << "\n";
  std::cout << "b has duplicates true  : " << has_duplicates( begin(b), end(b) ) << "\n";
  std::cout << "c has duplicates false : " << has_duplicates( begin(c), end(c) ) << "\n";
  std::cout << "d has duplicates true  : " << has_duplicates( begin(d), end(d) ) << "\n";
}
Spitsbergen answered 28/9, 2017 at 21:28 Comment(5)
It will use more memory and not be faster than sorting a vector.Scummy
Yes, but sorting a vector modifies the original content.Hundredfold
Of course. Look, I don’t think you understand the algorithm here and/or have completely ignored its termination conditions. For a large vector of completely unique elements, the memory requirements are not quite as friendly, but there is a trade-off entirely dependent on OP’s data and his operating requirements. Why not give OP all the options instead of blindly clunking along ‘sort a copy and write your own test’ for a very-likely minor memory issue when you could be using the power of the Standard Library to efficiently histogram for you?Hundredfold
Then you should have specified in the answer that this solution is only good if the number of unique elements is relatively small (or less probabilistically, if a duplicate is close to the beginning of the array).Scummy
No, it is only bad if the source data is particularly large and unlikely to have any duplicates. Please quit making unprovable generalized statements about probability and requirements. (If your data set is more likely to have a duplicate near the end, then reverse-walk your source, LOL.)Hundredfold
H
1

If your vector is small, say < 32 objects, or if copying and sorting the objects is expensive or impossible due to lack of move or copy constructor/assignment then a straight O(n^2) compare everything against everything else algorithm is the way to go.

Here is my solution:

template <typename Iterator>
bool has_duplicates( Iterator first, Iterator end ) {
    for (auto i = first; i != end; ++i) {
        for (auto j = first; i != j; ++j) {
            if (*i == *j) return true;
        }
    }
    return false;
}

template <typename Container>
bool has_duplicates(const Container &v) {
    for (const auto & i : v) {
        for (const auto & j : v) {
            if (&i == &j) break;
            if (i == j) return true;
        }
    }
    return false;
}
Hrutkay answered 29/6, 2022 at 10:51 Comment(0)
A
0

If you have a more complex vector and want to know which value is duplicated, you can do something like this:

#include <vector>
#include <unordered_set>
#include <string>

template <typename InputContainer_t, typename GetCompareValueLambda_t>
bool HasDuplicates(const InputContainer_t& container, GetCompareValueLambda_t compare)
{
    using CompareValue_t = decltype(compare(*container.begin()));
    std::unordered_set<CompareValue_t> setUniqueValues;
    bool hasDuplicates = false;

    for (const auto & element : container)
    {
        CompareValue_t compareValue = compare(element);

        auto [_, successful] = setUniqueValues.insert(compareValue);

        if (!successful)
        {
            printf("Value '%s' is not unique.\n", std::to_string(compareValue).c_str());
            hasDuplicates = true;
        }
    }
    return hasDuplicates;
}

int main()
{
    std::vector<int> a {1,2,2};
    HasDuplicates(a, [](auto x){ return x; });
    
    std::vector<S> b {{1, 0},{2, 1}, {2, 0}};
    HasDuplicates(b, [](auto x){ return x.idx; });

    return 0;
}

see also: https://onlinegdb.com/bvOzbptnR

Antechoir answered 13/10, 2023 at 11:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.