How can I make an unordered set of pairs of integers in C++?
Asked Answered
F

10

87

The following program does not compile an unordered set of pairs of integers, but it does for integers. Can unordered_set and its member functions be used on user-defined types, and how can I define it?

#include <unordered_set>
...

class A{
...
private: 
    std::unordered_set< std::pair<int, int> > u_edge_;
};

Compiler error:

error: no matching function for call to 'std::unordered_set >::unordered_set()'

Funchal answered 1/3, 2013 at 15:11 Comment(0)
S
36

Your code compiles on VS2010 SP1 (VC10), but it fails to compile with GCC g++ 4.7.2.

However, you may want to consider boost::hash from Boost.Functional to hash a std::pair (with this addition, your code compiles also with g++).

#include <unordered_set>
#include <boost/functional/hash.hpp>

class A
{
private: 
    std::unordered_set< 
        std::pair<int, int>, 
        boost::hash< std::pair<int, int> > 
    > u_edge_;
};
Stand answered 1/3, 2013 at 15:25 Comment(0)
S
77

There is no standard way of computing a hash on a pair. Add this definition to your file:

struct pair_hash {
    inline std::size_t operator()(const std::pair<int,int> & v) const {
        return v.first*31+v.second;
    }
};

Now you can use it like this:

std::unordered_set< std::pair<int, int>,  pair_hash> u_edge_;

This works, because pair<T1,T2> defines equality. For custom classes that do not provide a way to test equality you may need to provide a separate function to test if two instances are equal to each other.

Of course this solution is limited to a pair of two integers. Here is a link to an answer that helps you define a more general way of making hash for multiple objects.

Schoen answered 1/3, 2013 at 15:18 Comment(3)
Comments are not for extended discussion; this conversation has been moved to chat.Tiaratibbetts
Can you explain what does the hash need? Like I tried with (1,0) and (0,31) which give same hash but the code still worked. So what's happening?Duiker
@Duiker It is OK for hashes to be equal even when the two objects are not equal to each other. The requirement is that when two objects are equal, their hashes would be the same.Schoen
S
36

Your code compiles on VS2010 SP1 (VC10), but it fails to compile with GCC g++ 4.7.2.

However, you may want to consider boost::hash from Boost.Functional to hash a std::pair (with this addition, your code compiles also with g++).

#include <unordered_set>
#include <boost/functional/hash.hpp>

class A
{
private: 
    std::unordered_set< 
        std::pair<int, int>, 
        boost::hash< std::pair<int, int> > 
    > u_edge_;
};
Stand answered 1/3, 2013 at 15:25 Comment(0)
P
22

The problem is that std::unordered_set is using std::hash template to compute hashes for its entries and there is no std::hash specialization for pairs. So you will have to do two things:

  1. Decide what hash function you want to use.
  2. Specialize std::hash for your key type (std::pair<int, int>) using that function.

Here is a simple example:

#include <unordered_set>

namespace std {
template <> struct hash<std::pair<int, int>> {
    inline size_t operator()(const std::pair<int, int> &v) const {
        std::hash<int> int_hasher;
        return int_hasher(v.first) ^ int_hasher(v.second);
    }
};

}

int main()
{
    std::unordered_set< std::pair<int, int> > edge;
}
Puke answered 1/3, 2013 at 15:18 Comment(1)
@WenzelJakob Andy Prowl's answer also add to std namespace. Besides it's not illegal, just undefined behavior. On GCC of a specific version it always work fine.Inalterable
S
12

As already mentioned in most of the other answers on this question, you need to provide a hash function for std::pair<int, int>. However, since C++11, you can also use a lambda expression instead of defining a hash function. The following code takes the solution given by Sergey as basis:

auto hash = [](const std::pair<int, int>& p){ return p.first * 31 + p.second; };
std::unordered_set<std::pair<int, int>, decltype(hash)> u_edge_(8, hash);

Code on Ideone

I'd like repeat Sergey's disclaimer: This solution is limited to a pair of two integers. This answer provides the idea for a more general solution.

Sanches answered 6/2, 2019 at 15:20 Comment(0)
E
7

OK here is a simple solution with guaranteed non collisions. Simply reduce your problem to an existing solution i.e. convert your pair of int to string like so:

 auto stringify = [](const pair<int, int>& p, string sep = "-")-> string{
    return to_string(p.first) + sep + to_string(p.second);
 }

 unordered_set<string> myset;
 myset.insert(stringify(make_pair(1, 2)));
 myset.insert(stringify(make_pair(3, 4)));
 myset.insert(stringify(make_pair(5, 6)));

Enjoy!

Ethiopic answered 10/9, 2019 at 17:36 Comment(0)
M
6

You need to provide a specialization for std::hash<> that works with std::pair<int, int>. Here is a very simple example of how you could define the specialization:

#include <utility>
#include <unordered_set>

namespace std
{
    template<>
    struct hash<std::pair<int, int>>
    {
        size_t operator () (std::pair<int, int> const& p)
        {
            // A bad example of computing the hash, 
            // rather replace with something more clever
            return (std::hash<int>()(p.first) + std::hash<int>()(p.second));
        }
    };
}

class A
{
private:
    // This won't give you problems anymore
    std::unordered_set< std::pair<int, int> > u_edge_;
};
Macaroon answered 1/3, 2013 at 15:18 Comment(0)
K
3

The other answers here all suggest building a hash function that somehow combines your two integers.

This will work, but produces non-unique hashes. Though this is fine for your use of unordered_set, for some applications it may be unacceptable. In your case, if you happen to choose a bad hash function, it may lead to many unnecessary collisions.

But you can produce unique hashes!

int is usually 4 bytes. You could make this explicit by using int32_t.

The hash's datatype is std::size_t. On most machines, this is 8 bytes. You can check this upon compilation.

Since a pair consists of two int32_t types, you can put both numbers into an std::size_t to make a unique hash.

That looks like this (I can't recall offhandedly how to force the compiler to treat a signed value as though it were unsigned for bit-manipulation, so I've written the following for uint32_t.):

#include <cassert>
#include <cstdint>
#include <unordered_set>
#include <utility>


struct IntPairHash {
  std::size_t operator()(const std::pair<uint32_t, uint32_t> &p) const {
    assert(sizeof(std::size_t)>=8);  //Ensure that std::size_t, the type of the hash, is large enough
    //Shift first integer over to make room for the second integer. The two are
    //then packed side by side.
    return (((uint64_t)p.first)<<32) | ((uint64_t)p.second);
  }
};

int main(){
  std::unordered_set< std::pair<uint32_t, uint32_t>, IntPairHash> uset;
  uset.emplace(10,20);
  uset.emplace(20,30);
  uset.emplace(10,20);
  assert(uset.size()==2);
}
Karakalpak answered 1/9, 2018 at 5:19 Comment(0)
D
1

You are missing a hash function for std::pair<int, int>>. For example,

struct bad_hash
{
  std::size_t operator()(const std::pair<int,int>& p) const
  {
    return 42;
  }
};

....

std::unordered_set< std::pair<int, int>, bad_hash> u_edge_;

You can also specialize std::hash<T> for std::hash<std::pair<int,int>>, in which case you can omit the second template parameter.

Doronicum answered 1/3, 2013 at 15:14 Comment(6)
Your functor should exten std::unary_functionAnywise
@AlexChamberlain I thought that stuff was deprecated.Doronicum
@AlexChamberlain: Why should it do that? This class meets all the hash requirements.Armidaarmiger
@Doronicum Fair enough, though I'm still working with C++03 at work.Anywise
@MikeSeymour Because you don't know who else will use your functors.Anywise
@AlexChamberlain I know the feeling! But since the question deals with std::unordered_set I think it is safe to assume OP wants an C++11 solution.Doronicum
F
0

To make a unordered_set of pairs, you can either create a custom hash function or you can make an unordered_set of strings.

  1. Create custom hash function: Creating the custom hash depends on the data. So there is no one size fits all hash function. A good hash function must have fewer collisions, so you need to consider the collision count while making the hash function.

  2. Using Strings: Using string is very simple and takes less time. It also guarantees few or no collisions. Instead of using an unordered_set<pair<int, int>> we use an unordered_set. We can represent the pair by separating the numbers with a separator (character or string). The example given below shows how you can insert pair of integers with the separator (";").

    auto StringPair = [](const pair<int, int>& x){return to_string(x.first) + ";" + to_string(x.second);}; unordered_set Set;

    vector<pair<int, int>> Nums = {{1,2}, {2, 3}, {4, 5}, {1,2}};

    for(auto & pair: Nums) { Set.insert(StringPair(pair)); }

Fallacy answered 15/3, 2022 at 16:16 Comment(0)
P
0

Just to add my 2 cents here, it's weird that to use unordered_set you need to specify an external hash function. Encapsulation principle would prefer that inside your class you would have an 'hash()' function that returns the hash, and the unordered_set would call that. You should have an Hashable interface and your class, in this case std::pair, would implement that interface. I think this is the approach followed by languages like Java. Unfortunately C++ doesn't follow this logic. The closest you can get to mimic that is:

  1. derive a class from std::pair (this allows you to have more readable code anyway)
  2. pass the hash function to the unordered_set template

Code Sample

class Point : public pair<int, int> {
   public:
   Point() {};
   Point(int first, int second) : pair{first, second}{};
   class Hash {
      public:
      auto operator()(const Point &p) const -> size_t {
         return ((size_t)p.first) << 32 | ((size_t)p.second);
      }
   };
 };

int main()
{
    unordered_set< Point, Point::Hash > us;
    Point mypoint(1000000000,1);
    size_t res = Point::Hash()(mypoint);
    cout<<"Hello World " << res << " " << mypoint.first;

    return 0;
}

The simple hash function used works if size_t is 64bit and int is 32bit, in this case this hash function guarantees no collisions and it's ideal.

Pouf answered 13/2, 2023 at 18:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.