Generic hash for tuples in unordered_map / unordered_set
Asked Answered
S

5

64

Why doesn't std::unordered_map<tuple<int, int>, string> just work out of the box? It is tedious to have to define a hash function for tuple<int, int>, e.g.

template<> struct do_hash<tuple<int, int>>                               
{   size_t operator()(std::tuple<int, int> const& tt) const {...}  }; 

Building an unordered map with tuples as keys (Matthieu M.) shows how to automate this for boost::tuple. Is there anyway of doing this for c++0x tuples without using variadic templates?

Surely this should be in the standard :(

Saltworks answered 18/8, 2011 at 15:49 Comment(3)
If you're using C++0x why not use variadic templates? (or is there an implementation with tuples, but no varidic templates?)Deviltry
@RMartinho : VC++ 2010 matches that description (and it seems the next version of VC++ is likely to as well).Cardwell
If std::tuple is implemented by manually writing out templates of 1 to N-arity (like boost::tuple), then I guess the std::hash specialisation necessary to hash the tuples will also have to be manually written out (using too clever preprocessing?) the same way. Aaargh!Saltworks
S
47

This works on gcc 4.5 allowing all c++0x tuples containing standard hashable types to be members of unordered_map and unordered_set without further ado. (I put the code in a header file and just include it.)

The function has to live in the std namespace so that it is picked up by argument-dependent name lookup (ADL).

Is there a simpler solution?

#include <tuple>
namespace std{
    namespace
    {

        // Code from boost
        // Reciprocal of the golden ratio helps spread entropy
        //     and handles duplicates.
        // See Mike Seymour in magic-numbers-in-boosthash-combine:
        //     http://stackoverflow.com/questions/4948780

        template <class T>
        inline void hash_combine(std::size_t& seed, T const& v)
        {
            seed ^= std::hash<T>()(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
        }

        // Recursive template code derived from Matthieu M.
        template <class Tuple, size_t Index = std::tuple_size<Tuple>::value - 1>
        struct HashValueImpl
        {
          static void apply(size_t& seed, Tuple const& tuple)
          {
            HashValueImpl<Tuple, Index-1>::apply(seed, tuple);
            hash_combine(seed, std::get<Index>(tuple));
          }
        };

        template <class Tuple>
        struct HashValueImpl<Tuple,0>
        {
          static void apply(size_t& seed, Tuple const& tuple)
          {
            hash_combine(seed, std::get<0>(tuple));
          }
        };
    }

    template <typename ... TT>
    struct hash<std::tuple<TT...>> 
    {
        size_t
        operator()(std::tuple<TT...> const& tt) const
        {                                              
            size_t seed = 0;                             
            HashValueImpl<std::tuple<TT...> >::apply(seed, tt);    
            return seed;                                 
        }                                              

    };
}

Standard Conformant code

Yakk points out that specialising things in the std namespace is actually undefined behaviour. If you wish to have a standards conforming solution, then you need to move all of this code into your own namespace and give up any idea of ADL finding the right hash implementation automatically. Instead of :

unordered_set<tuple<double, int> > test_set;

You need:

unordered_set<tuple<double, int>, hash_tuple::hash<tuple<double, int>>> test2;

where hash_tuple is your own namespace rather than std::.

To do this, you first have to declare a hash implementation inside the hash_tuple namespace. This will forward all non tuple types to the std::hash:

namespace hash_tuple{

template <typename TT>
struct hash
{
    size_t
    operator()(TT const& tt) const
    {                                              
        return std::hash<TT>()(tt);                                 
    }                                              
};
}

Make sure that hash_combine calls hash_tuple::hash and not std::hash

namespace hash_tuple{

namespace
    {
    template <class T>
    inline void hash_combine(std::size_t& seed, T const& v)
    {
        seed ^= hash_tuple::hash<T>()(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
    }
}

Then include all the other previous code but put it inside namespace hash_tuple and not std::

namespace hash_tuple{

    namespace
    {
        // Recursive template code derived from Matthieu M.
        template <class Tuple, size_t Index = std::tuple_size<Tuple>::value - 1>
        struct HashValueImpl
        {
          static void apply(size_t& seed, Tuple const& tuple)
          {
            HashValueImpl<Tuple, Index-1>::apply(seed, tuple);
            hash_combine(seed, std::get<Index>(tuple));
          }
        };

        template <class Tuple>
        struct HashValueImpl<Tuple,0>
        {
          static void apply(size_t& seed, Tuple const& tuple)
          {
            hash_combine(seed, std::get<0>(tuple));
          }
        };
    }

    template <typename ... TT>
    struct hash<std::tuple<TT...>> 
    {
        size_t
        operator()(std::tuple<TT...> const& tt) const
        {                                              
            size_t seed = 0;                             
            HashValueImpl<std::tuple<TT...> >::apply(seed, tt);    
            return seed;                                 
        }                                              
    };

}
Saltworks answered 18/8, 2011 at 23:57 Comment(12)
Assuming you don't want to use boost libraries shouldn't you have std::hash_combine instead of boost::hash_combine?Vincentvincenta
Is there a std::hash_combine?Saltworks
Your template compiled using std::hash_combine with gcc 4.6.3. I made the switch to avoid using boostVincentvincenta
@LeoGoodstadt: +1. Your hash is almost certainly better than bare XOR. I've been told that seed += hash(v); seed *= m, where m is a large odd number, is a good choice, but I don't know whether its better or worse than yours.Spate
@Bo Lu: I was wondering what you were talking about since I include the hash_combine explicitly to avoid boost dependencies. Then I noticed I forgot to remove the boost namespace qualifier from my code... Hope it still works.Saltworks
Not worth the undefined behavior: don't specialize things in std:: involving things you do not own, and you don't own std::tuple<TT...>. As a concrete exmaple of how this could horribly break code, what happens when a new iteration of the standard introduces its own hash specialization? What happens when someone else who has your bright idea introduces a narrow hash<tuple<int>> specialization, which is visible from some but not all places where hash<tuple<int>> is used? Those are concrete examples, but the UB is not bounded by them. Your program is ill formed.Cerell
Thanks @Yakk. Answer updated to suggest avoiding the std:: namespace.Saltworks
Doesn't touching std result in undefined behavior? Does using an anon namespace in std get around that??Averyaveryl
@Averyaveryl This is old, but adding specializations to the std namespace is actually recommended. Adding classes, functions, or other definitions is explicitly forbidden. en.cppreference.com/w/cpp/language/extending_stdRhiamon
@AlexanderHuszagh You can add specializations into std namespace only for custom types (therefore not for std::tuple). This is what Yakk mentioned in his/her comment.Chellman
@DanielLangr Neither mine nor @Yakk's comments are contradictory. You are expressly encouraged to add specializations in the std namespace, doing so for types already in the standard library is bound to cause problems.Rhiamon
rather than having hash_tuple::hash be a template, you could have a single hash_tuple object with a template operator()Indo
T
15
#include <boost/functional/hash.hpp>
#include <tuple>

namespace std
{

template<typename... T>
struct hash<tuple<T...>>
{
    size_t operator()(tuple<T...> const& arg) const noexcept
    {
        return boost::hash_value(arg);
    }
};

}
These answered 26/3, 2013 at 11:50 Comment(0)
K
12

With C++20, it is possible to use fold expressions and generic lambdas to compute hash of a tuple without recursion. I prefer to rely on std::hash<uintmax_t> instead of manually combining hashes:

#include <cinttypes>
#include <cstddef>
#include <functional>
#include <tuple>

class hash_tuple {
    template<class T>
    struct component {
        const T& value;
        component(const T& value) : value(value) {}
        uintmax_t operator,(uintmax_t n) const {
            n ^= std::hash<T>()(value);
            n ^= n << (sizeof(uintmax_t) * 4 - 1);
            return n ^ std::hash<uintmax_t>()(n);
        }
    };

public:
    template<class Tuple>
    size_t operator()(const Tuple& tuple) const {
        return std::hash<uintmax_t>()(
            std::apply([](const auto& ... xs) { return (component(xs), ..., 0); }, tuple));
    }
};

- 1 in sizeof(uintmax_t) * 4 - 1 is optional, but appears to slightly improve hash distribution. This class can be used both with std::tuple and std::pair.

Knowling answered 12/3, 2019 at 2:56 Comment(3)
n ^= n << (sizeof(uintmax_t) * 4 - 1);Bouzoun
Note that the line "n ^ std::hash<uintmax_t>()(n);" will only return 0 on platforms where the default hash function is identity, making this hash function useless.Windcheater
Usage of operator, vs. operator+ or operator^, the latter two seem more appropriate of course subjectively speaking.Sefton
K
10

In my C++0x draft, 20.8.15 says hash is specialized for built-in types (including pointers, but doesn't seem to imply dereferencing them). It also appears to be specialized for error_code, bitset<N>, unique_ptr<T, D>, shared_ptr<T>, typeindex, string, u16string, u32string, wstring, vector<bool, Allocator>, and thread::id. (facinating list!)

I've not used C++0x variadics, so my formatting is probably way off, but something along these lines might work for all tuples.

size_t hash_combiner(size_t left, size_t right) //replacable
{ return left + 0x9e3779b9 + (right<<6) + (right>>2);}

template<int index, class...types>
struct hash_impl {
    size_t operator()(size_t a, const std::tuple<types...>& t) const {
        typedef typename std::tuple_element<index, std::tuple<types...>>::type nexttype;
        hash_impl<index-1, types...> next;
        size_t b = std::hash<nexttype>()(std::get<index>(t));
        return next(hash_combiner(a, b), t); 
    }
};
template<class...types>
struct hash_impl<0, types...> {
    size_t operator()(size_t a, const std::tuple<types...>& t) const {
        typedef typename std::tuple_element<0, std::tuple<types...>>::type nexttype;
        size_t b = std::hash<nexttype>()(std::get<0>(t));
        return hash_combiner(a, b); 
    }
};

template<class...types>
struct tuple_hash<std::tuple<types...>> {
    size_t operator()(const std::tuple<types...>& t) {
        const size_t begin = std::tuple_size<std::tuple<types...>>::value-1;
        return hash_impl<begin, types...>()(0, t);
    }
}

This version actually compiles and runs

Yakk has observed that specializing std::hash directly is technically not allowed, since we're specializing a standard library template with a declaration that does not depend on a user-defined type.

Kristlekristo answered 18/8, 2011 at 17:19 Comment(7)
Use left() ^ right() if you don't know what to do. See #5889738 . But note that XOR is not always the right choice. Using plain addition can turn out to be better if you expect the tuples to contain duplicate members. This is probably the reason why there is no standard hash for tuples.Condottiere
It must be an oversight that tuple is not hashable. Too late for the standard though :(Saltworks
@Leo oddly enough, no container except for vector<bool>, and basic_string is hashable. (is bitset technically a container?)Kristlekristo
oh, I just now noticed: without using variadic templates? whoops. The answer is no, can't be done for all tuples, since a tuple is a variadic template type.Kristlekristo
@Alexandre Turns out this revised code looks almost exactly like my answer (after reformatting for whitespace) except that (1) I use a different (more robust?) way of dealing with entropy when combining hash values and (2) I don't have to deduce the type of the "next" tuple value explicitly using std::tuple_element because my hash_combine function is parameterised on its type.Saltworks
@AlexandreC.: ^ and + are both commutative, and are thus poor choices for combining hashes. Consider how std::unordered_set<std::tuple<int, int, …>> would handle permutations of {1, 2, …, 10}. Instead, use a non-commutative combiner such as m * left + right, where m is a large odd number.Spate
Specializing hash for a type you do not own means your program is ill formed. And you don't own all of hash<tuple<Ts...>>. And ^ is a horrible hash combiner.Cerell
M
4

If you want to do it in a simple way. Just do

std::unordered_map<std::tuple<int, int>, std::string, boost::hash<std::tuple<int, int>>> mp;
Mellophone answered 21/9, 2022 at 12:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.