Check std::vector has duplicates

Asked 28/9, 2017 at 20:27 Answered 13/10, 2023 at 11:11

Solved c++algorithm for-loop vector duplicates

I want to check if a vector of integers has any duplicates or not, and have to return true if it does. So I try to do something like this:

vector<int> uGuess = {1,2,3,3,4,5}
vector<int> a = uGuess;
sort(a.begin(), a.end());
bool d = unique(a.begin(), a.end());

And this will not work since unqiue cannot be assigned as a bool value. How should I proceed towards this? If I were to write a for loop to perform the same action, how should I do that?

Olnek answered 28/9, 2017 at 20:27 Comment(10)

unique doesn't do what you are describing you want to do – Manageable 28/9, 2017 at 20:30

auto it = std::unique(a.begin(), a.end()); bool b = it == a.end(); – Hermanhermann 28/9, 2017 at 20:30

Check a reference. – Provenance 28/9, 2017 at 20:30

@Hermanhermann doesn't work on non-sorted vectors – Manageable 28/9, 2017 at 20:33

@patatahooligan: In the example it is sorted as initialized, then it gets sorted. I think it's safe to say it's sorted. – Materiel 28/9, 2017 at 20:34

@FredLarson True, but OP's statement of the problem does not mention that the array is sorted. OP should clarify. – Manageable 28/9, 2017 at 20:35

@Manageable It doesn't matter, he calls sort() in his code, so then it's sorted. – Pion 28/9, 2017 at 20:38

I kid you not, I somehow skipped over that line before. In that case @Hermanhermann 's comment could be the accepted answer. – Manageable 28/9, 2017 at 20:40

"How should I proceed towards this" depends on what this is. Don't post code that doesn't work and ask people to guess at what it's intended to do; say it. What should the value in d mean? – Disinclination 28/9, 2017 at 20:48

Seems a duplicate of Determining if an unordered vector<T> has all unique elements and Determine if there are duplicates in vector. – Butterfly 20/4, 2020 at 22:47

Looking at google for std::unique I found this page std::unique. I looked at what it did:

Eliminates all except the first element from every consecutive group of equivalent elements from the range [first, last)

So it looks like it does what you want - removes the duplicates.

I then looked at what it returns...

... returns a past-the-end iterator for the new logical end of the range

So the result from std::unique is a sequence which is not necessary the same as the whole vector.

If nothing was removed, the return value would be the end of the vector.

So you want:

vector<int>::iterator it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Or for C++11:

auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Finally for the unique function to work, the vector needs to be sorted, so the complete code would be:

sort(a.begin(), a.end());
auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Winner answered 28/9, 2017 at 20:37 Comment(4)

"So it looks like it does what you want - removes the duplicates." What?! Where in the world does the OP ask for removing duplicates? This answer is wrong as std::unique modifies the vector. – Artistry 29/6, 2022 at 10:10

@Artistry Modifying a duplicate vector seems fine – QA tried that her-/himself, too, but didn't manage to apply the correct checks afterwards... – Sobersided 29/6, 2022 at 11:4

@Sobersided My comment was about removing the duplicates. That was not what the OP asked for. – Artistry 29/6, 2022 at 15:11

@Artistry Not directly – but the initial idea has been since right from the start removing duplicates from a copy and then compare sizes – if differing, removal took place thus duplicates are still available in original unchanged vector – solely the appropriate test for removal detection has been lacking... – Sobersided 30/6, 2022 at 7:26

The algorithm you're looking for is std::adjacent_find.

// The container must be sorted!
const std::vector<int> sortedVector = {1,2,3,3,4,5};
const bool hasDuplicates = std::adjacent_find(sortedVector.begin(), sortedVector.end()) != sortedVector.end();

Unlike std::unique, std::adjacent_find doesn't modify the container.

As a bonus, std::adjacent_find returns an iterator to the first element in the duplicate "pair":

const auto duplicate = std::adjacent_find(sortedVector.begin(), sortedVector.end());

if (duplicate != sortedVector.end())
    std::cout << "Duplicate element = " << *duplicate << "\n";

Guarded answered 3/1, 2019 at 18:51 Comment(1)

Only works for sorted containers – Antechoir 13/10, 2023 at 6:42

You should using set

set<int> s(a.begin(), a.end());
return s.size() != a.size();

Airscrew answered 17/2, 2019 at 12:26 Comment(4)

You mean return s.size()==a.size();, right? I like this approach because I do something similar in python, but do you have any idea how fast this is vs sorting? – Arron 17/4, 2019 at 16:18

@Arron This should be as fast as sorting but if you use std::unordered_set in place of std::set it can be faster as it is a hash set. – Papagena 18/4, 2021 at 9:20

why does it not have O(n) complexity where n = a.size() ? – Chiller 14/10, 2021 at 10:41

@ShubhamGarg Late reply for late readers: Insertion complexity for std::set is O(log(s)) with s being current size of the set – there are n elements, and O(sum(i = 0 -> n) { log(i) }) (is that understandable?) remains O(n*log(n))... – Sobersided 29/6, 2022 at 10:58

Looking at google for std::unique I found this page std::unique. I looked at what it did:

Eliminates all except the first element from every consecutive group of equivalent elements from the range [first, last)

So it looks like it does what you want - removes the duplicates.

I then looked at what it returns...

... returns a past-the-end iterator for the new logical end of the range

So the result from std::unique is a sequence which is not necessary the same as the whole vector.

If nothing was removed, the return value would be the end of the vector.

So you want:

vector<int>::iterator it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Or for C++11:

auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Finally for the unique function to work, the vector needs to be sorted, so the complete code would be:

sort(a.begin(), a.end());
auto it = std::unique(a.begin(), a.end());
bool wasUnique = (it == a.end());

Winner answered 28/9, 2017 at 20:37 Comment(4)

@Artistry Modifying a duplicate vector seems fine – QA tried that her-/himself, too, but didn't manage to apply the correct checks afterwards... – Sobersided 29/6, 2022 at 11:4

@Sobersided My comment was about removing the duplicates. That was not what the OP asked for. – Artistry 29/6, 2022 at 15:11

If someone is forced to write own algorithm:

bool hasDuplicates(const std::vector<int>& arr) {
    for (std::size_t i = 0; i < arr.size(); ++i) {
        for (std::size_t j = i + 1; j < arr.size(); ++j) {
            if (arr[i] == arr[j])
                return true;
        }
    }
    return false;
}

But in real code you should use things that already exist, and in the standard library.

Binnie answered 28/9, 2017 at 21:12 Comment(2)

That leaves the question: what function in the STL is the best fit for this problem? – Glimmering 27/1, 2021 at 22:0

I believe this will have bad performance for large vectors due to caches. If you use for (std::size_t j = 0; j < i; ++j) you test the small ranges first, which fits in cache, and only at the end do slow searches outside of cache. It's likely to terminate early if there is a duplicate and never reach the slow part. – Hrutkay 29/6, 2022 at 10:44

Sort the vector if it's not already sorted, and then use std::unique(), like this:

#include <iostream>
#include <vector>
#include <algorithm>

int main() {
    std::vector<int> v = {3, 1, 3, 4, 5};
    sort(v.begin(), v.end());
    auto it = std::unique(v.begin(), v.end());
    std::cout << ((it == v.end()) ? "Unique\n" : "Duplicate(s)\n");
    return 0;
}

Output:

Duplicate(s)

Andersonandert answered 28/9, 2017 at 21:18 Comment(3)

Here, the original vector gets changed. So, this can't be the useful method. – Excitability 10/2, 2021 at 6:42

@Andersonandert The input vector has to be sorted. Am I right? – Jessalyn 31/7, 2021 at 9:6

Yes the vector needs to be sorted, answer updated! Thanks @Jessalyn for the comment and the upvote! – Andersonandert 31/7, 2021 at 15:20

So far all these solutions either modify the container or have O(n²) complexity. You can put a std::map to much better use:

#include <algorithm>
#include <iterator>
#include <map>

template <typename Iterator>
bool has_duplicates( Iterator first, Iterator last )
{
  std::map <typename std::iterator_traits <Iterator> ::value_type, std::size_t> histogram;

  while (first != last)
    if (++histogram[ *first++ ] > 1) 
      return true;

  return false;
}

#include <iostream>
#include <vector>

int main()
{
  using std::begin;
  using std::end;

  int a[] = { 2, 3, 5, 7, 11 };
  int b[] = { 2, 3, 5, 5, 7 };

  std::vector <int> c( begin(a), end(a) );
  std::vector <int> d( begin(b), end(b) );

  std::cout << std::boolalpha;
  std::cout << "a has duplicates false : " << has_duplicates( begin(a), end(a) ) << "\n";
  std::cout << "b has duplicates true  : " << has_duplicates( begin(b), end(b) ) << "\n";
  std::cout << "c has duplicates false : " << has_duplicates( begin(c), end(c) ) << "\n";
  std::cout << "d has duplicates true  : " << has_duplicates( begin(d), end(d) ) << "\n";
}

Spitsbergen answered 28/9, 2017 at 21:28 Comment(5)

It will use more memory and not be faster than sorting a vector. – Scummy 6/10, 2017 at 17:4

Yes, but sorting a vector modifies the original content. – Hundredfold 6/10, 2017 at 21:14

Of course. Look, I don’t think you understand the algorithm here and/or have completely ignored its termination conditions. For a large vector of completely unique elements, the memory requirements are not quite as friendly, but there is a trade-off entirely dependent on OP’s data and his operating requirements. Why not give OP all the options instead of blindly clunking along ‘sort a copy and write your own test’ for a very-likely minor memory issue when you could be using the power of the Standard Library to efficiently histogram for you? – Hundredfold 7/10, 2017 at 12:3

Then you should have specified in the answer that this solution is only good if the number of unique elements is relatively small (or less probabilistically, if a duplicate is close to the beginning of the array). – Scummy 7/10, 2017 at 12:12

No, it is only bad if the source data is particularly large and unlikely to have any duplicates. Please quit making unprovable generalized statements about probability and requirements. (If your data set is more likely to have a duplicate near the end, then reverse-walk your source, LOL.) – Hundredfold 7/10, 2017 at 12:26

If your vector is small, say < 32 objects, or if copying and sorting the objects is expensive or impossible due to lack of move or copy constructor/assignment then a straight O(n^2) compare everything against everything else algorithm is the way to go.

Here is my solution:

template <typename Iterator>
bool has_duplicates( Iterator first, Iterator end ) {
    for (auto i = first; i != end; ++i) {
        for (auto j = first; i != j; ++j) {
            if (*i == *j) return true;
        }
    }
    return false;
}

template <typename Container>
bool has_duplicates(const Container &v) {
    for (const auto & i : v) {
        for (const auto & j : v) {
            if (&i == &j) break;
            if (i == j) return true;
        }
    }
    return false;
}

Hrutkay answered 29/6, 2022 at 10:51 Comment(0)

If you have a more complex vector and want to know which value is duplicated, you can do something like this:

#include <vector>
#include <unordered_set>
#include <string>

template <typename InputContainer_t, typename GetCompareValueLambda_t>
bool HasDuplicates(const InputContainer_t& container, GetCompareValueLambda_t compare)
{
    using CompareValue_t = decltype(compare(*container.begin()));
    std::unordered_set<CompareValue_t> setUniqueValues;
    bool hasDuplicates = false;

    for (const auto & element : container)
    {
        CompareValue_t compareValue = compare(element);

        auto [_, successful] = setUniqueValues.insert(compareValue);

        if (!successful)
        {
            printf("Value '%s' is not unique.\n", std::to_string(compareValue).c_str());
            hasDuplicates = true;
        }
    }
    return hasDuplicates;
}

int main()
{
    std::vector<int> a {1,2,2};
    HasDuplicates(a, [](auto x){ return x; });
    
    std::vector<S> b {{1, 0},{2, 1}, {2, 0}};
    HasDuplicates(b, [](auto x){ return x.idx; });

    return 0;
}

Antechoir answered 13/10, 2023 at 11:11 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags