I was solving the basic problem of finding the number of distinct integers in a given array.
My idea was to declare an std::unordered_set
, insert all given integers into the set, then output the size of the set. Here's my code implementing this strategy:
#include <iostream>
#include <fstream>
#include <cmath>
#include <algorithm>
#include <vector>
#include <unordered_set>
using namespace std;
int main()
{
int N;
cin >> N;
int input;
unordered_set <int> S;
for(int i = 0; i < N; ++i){
cin >> input;
S.insert(input);
}
cout << S.size() << endl;
return 0;
}
This strategy worked for almost every input. On other input cases, it timed out.
I was curious to see why my program was timing out, so I added an cout << i << endl;
line inside the for-loop. What I found was that when I entered the input case, the first 53000
or so iterations of the loop would pass nearly instantly, but afterwards only a few 100
iterations would occur each second.
I've read up on how a hash set can end up with O(N)
insertion if a lot of collisions occur, so I thought the input was causing a lot of collisions within the std::unordered_set
.
However, this is not possible. The hash function that the std::unordered_set
uses for integers maps them to themselves (at least on my computer), so no collisions would ever happen between different integers. I accessed the hash function using the code written on this link.
My question is, is it possible that the input itself is causing the std::unordered_set
to slow down after it hits around 53000
elements inserted? If so, why?
Here is the input case that I tested my program on. It's pretty large, so it might lag a bit.