Generic Hash function for all STL-containers
Asked Answered
I

1

14

I'm using an std::unordered_map<key,value> in my implementation. i will be using any of the STL containers as the key. I was wondering if it is possible to create a generic hash function for any container being used.

This question in SO offers generic print function for all STL containers. While you can have that, why cant you have something like a Hash function that defines everything ? And yeah, a big concern is also that it needs to fast and efficient.

I was considering doing a simple hash function that converts the values of the key to a size_t and do a simple function like this.

Can this be done ?

PS : Please don't use boost libraries. Thanks.

Illogic answered 1/8, 2011 at 13:52 Comment(3)
What will be the content of the containers used as keys?Spall
When would you consider two keys to be equal? What if they had a different order of elements? What if the elements would not be comparable? How would you efficiently compare elements that by definition are only supposed to be less-then-comparable at best?Luisaluise
That is a good question. That was supposed to be my next question. Thanks for pointing it out. Now I assume that all my elements are ordered. If they are not ordered except for key type std::set<int> it is a problem. Since it is an integer can we make a "smart" hash function that mimicks std::set? For example, <1,2,3> should yield the same hash as <3,2,1>.Illogic
B
18

We can get an answer by mimicking Boost and combining hashes.

Warning: Combining hashes, i.e. computing a hash of many things from many hashes of the things, is not a good idea generally, since the resulting hash function is not "good" in the statistical sense. A proper hash of many things should be build from the entire raw data of all the constituents, not from intermediate hashes. But there currently isn't a good standard way of doing this.

Anyway:

First off, we need the hash_combine function. For reasons beyond my understanding it's not been included in the standard library, but it's the centrepiece for everything else:

template <class T>
inline void hash_combine(std::size_t & seed, const T & v)
{
  std::hash<T> hasher;
  seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}

Using this, we can hash everything that's made up from hashable elements, in particular pairs and tuples (exercise for the reader).

However, we can also use this to hash containers by hashing their elements. This is precisely what Boost's "range hash" does, but it's straight-forward to make that yourself by using the combine function.

Once you're done writing your range hasher, just specialize std::hash and you're good to go:

namespace std
{
  template <typename T, class Comp, class Alloc>
  struct hash<std::set<T, Comp, Alloc>>
  {
    inline std::size_t operator()(const std::set<T, Comp, Alloc> & s) const
    {
      return my_range_hash(s.begin(), s.end());
    }
  };

  /* ... ditto for other containers */
}

If you want to mimic the pretty printer, you could even do something more extreme and specialize std::hash for all containers, but I'd probably be more careful with that and make an explicit hash object for containers:

template <typename C> struct ContainerHasher
{
  typedef typename C::value_type value_type;
  inline size_t operator()(const C & c) const
  {
    size_t seed = 0;
    for (typename C::const_iterator it = c.begin(), end = c.end(); it != end; ++it)
    {
      hash_combine<value_type>(seed, *it);
    }
    return seed;
  }
};

Usage:

std::unordered_map<std::set<int>, std::string, ContainerHasher<std::set<int>>> x;
Brusquerie answered 1/8, 2011 at 13:56 Comment(10)
This is excellent. I've a doubt in your second piece of code. You used return my_range_hash(s.begin(), s.end());. Where have you defined the my_range_hash function ? Sorry if this is a dumb question ?Illogic
Another thing about the second piece of code is that you cannot add a specialization to namespace std for all types, you have to have some user defined type in here. The code is good, but too general.Baluster
@Sunil: I've left my_range_hash for you to implement -- it'll look just like the operator() in my ContainerHasher!Brusquerie
@Bo: What's "too general"? Do you mean that it's not possible to define a partial specialization that will capture all containers that match the is_container typetrait, or do you mean that std::set<T,Comp,Alloc> is too general? I'm actually worrying now that the pretty printer is illegally adding overloads to std, perhaps that should have been done with ADL, too...Brusquerie
@Kerrek - You can only add specializations for user defined types to namespace std. If you have typename T it matches not only your types, but all types already in namespace std and also all my types, which I might not want.Baluster
The (boost independent) c++11 code to use tuples of hashable elements in unordered_map and unordered_set can be found here: generic-hash-for-tuples-in-unordered-map-unordered-setCassowary
Given the question's "Generic Hash function for all STL-containers", it's worthy of prominent mention that this code won't work when you want containers with the same elements in another order to be deemed equal (because hash_combine() for a then b != hash_combine for b then a), e.g. for unordered_map, multimap, and when an application doesn't happen to consistently order content in containers like std::vector and std::list (where a -> b may or may not be considered equal to b -> a depending on the app's needs).Kev
@TonyD: Sure. What matters is that the hash value is compatible with equality. For sequence containers, the sequence order is relevant, so this is fine, but for unordered containers you'll have to come up with something else. A "range hash" is not the same as an "unordered collection hash", indeed.Brusquerie
@TonyD: Actually what's much worse is that hash combining is not a good idea in general, since the statistical properties of the resulting hash value are poor and don't meet the requirements of a "good hash function".Brusquerie
@KerrekSB: "compatible with equality" is a very reasonable choice of for vector et al but may not be what's wanted and would be better documented. Re hash_combine - many "hash(string)" functions begin by effectively hashing a fixed number of characters (1, 2, 4...?) followed by repetitive combining with the next N or a hash thereof... it's hard to choose an optimal combine algorithm if you're not also aware of the hashing logic producing the values it's combining, but the better the latter the easier combine is. VC++ std::hash<std::string>-style's so bad it's easy to surpass anyway.Kev

© 2022 - 2024 — McMap. All rights reserved.