Hashing types at compile-time in C++17/C++2a
Asked Answered
C

6

12

Consider the following code:

#include <iostream>
#include <type_traits>

template <class T>
constexpr std::size_t type_hash(T) noexcept 
{
    // Compute a hash for the type
    // DO SOMETHING SMART HERE
}

int main(int argc, char* argv[])
{
    auto x = []{};
    auto y = []{};
    auto z = x;
    std::cout << std::is_same_v<decltype(x), decltype(y)> << std::endl; // 0
    std::cout << std::is_same_v<decltype(x), decltype(z)> << std::endl; // 1
    constexpr std::size_t xhash = type_hash(x);
    constexpr std::size_t yhash = type_hash(y);
    constexpr std::size_t zhash = type_hash(z);
    std::cout << (xhash == yhash) << std::endl; // should be 0
    std::cout << (yhash == zhash) << std::endl; // should be 1
    return 0;
}

I would like the type_hash function to return a hash key unique to the type, at compile-time. Is there a way to do that in C++17, or in C++2a (ideally only relying on the standard and without relying compiler intrinsics)?

Cahill answered 24/5, 2019 at 11:50 Comment(9)
(disclaimer: non fancy way) If you know types already you can just specialize the type_hash function for types of interest, make sure you return unique values for each type.Glamorous
@Glamorous Unfortunately, I don't know the types in advanceCahill
On some implementations, &typeid(T) will always be the same for the same type, but this isn't guaranteed. And unfortunately std::type_info::hash_code() is not constexpr.Joe
@Cahill please check this if it helps wandbox.org/permlink/yp1VWQe1BmyNCcO5Glamorous
@Glamorous As said, std::type_info::hash_code() is not constexpr. And I need these hashes at compile-time.Cahill
It's small thing, but you probably wanted to compare y with z and not x with z in second cout, as later there is yhash == zhash.Ezraezri
@Cahill - Please, could you confirm that you want 1 from yhash == zhash? Or maybe from xhash == zhash?Avellaneda
@aschepler: My first thought, too. Unfortunately typeid() behaves polymorphically for pointer types, so it cannot be constexprHertz
Why should yhash be the same as zhash? Should xhash be the same as zhash?Hertz
A
8

I don't know a way to obtain a std::size_t for the hash.

But if you accept a pointer to something, maybe you can take the address of a static member in a template class.

I mean... something as follows

#include <iostream>
#include <type_traits>

template <typename>
struct type_hash
 {
   static constexpr int          i     { };
   static constexpr int const *  value { &i };
 };

template <typename T>
static constexpr auto type_hash_v = type_hash<T>::value;


int main ()
 {
   auto x = []{};
   auto y = []{};
   auto z = x;
   std::cout << std::is_same_v<decltype(x), decltype(y)> << std::endl; // 0
   std::cout << std::is_same_v<decltype(x), decltype(z)> << std::endl; // 1
   constexpr auto xhash = type_hash_v<decltype(x)>;
   constexpr auto yhash = type_hash_v<decltype(y)>;
   constexpr auto zhash = type_hash_v<decltype(z)>;
   std::cout << (xhash == yhash) << std::endl; // should be 0
   std::cout << (xhash == zhash) << std::endl; // should be 1
 } // ...........^^^^^  xhash, not yhash

If you really want type_hash as a function, I suppose you could simply create a function that return the type_hash_v<T> of the type received.

Avellaneda answered 24/5, 2019 at 12:43 Comment(5)
I like the idea, but it doesn't seem to work : wandbox.org/permlink/HVAoJWlmj7onyapgMurcia
@MartinMorterol - Thanks for the report: I didn't noticed before. It seems to me that the behavior is correct but the last cout (from the OP) is wrong: the second cout (if we want obtain a 1) should compare xhash and zhash because x and z are from the same type. I've corrected my answer.Avellaneda
xD didn't see that either. Nice solution btw !Murcia
This is a clever approach. std::any implementations tend to use similar ideas to avoid RTTI. It's a pity that a pointer cannot be reinterpret_casted to an integer in constant evaluation.Marble
The extra static variable i can be avoided by initializing value with its own address. e.g.: template<class T> inline constexpr const void* type_hash = &type_hash<T>;(godbolt example)Consideration
E
10

I doubt that's possible with purely the standard C++.


But there is a solution that will work on most major compilers (at least GCC, Clang, and MSVC). You could hash strings returned by the following function:

template <typename T> constexpr const char *foo()
{
    #ifdef _MSC_VER
    return __FUNCSIG__;
    #else
    return __PRETTY_FUNCTION__;
    #endif
}
Enmity answered 24/5, 2019 at 23:48 Comment(1)
unfortunately, it will not work with unnamed structures: gcc, for instance, would print constexpr const char *foo() [with <template-parameter-1-1> = main()::<unnamed struct>]:Dotty
A
8

I don't know a way to obtain a std::size_t for the hash.

But if you accept a pointer to something, maybe you can take the address of a static member in a template class.

I mean... something as follows

#include <iostream>
#include <type_traits>

template <typename>
struct type_hash
 {
   static constexpr int          i     { };
   static constexpr int const *  value { &i };
 };

template <typename T>
static constexpr auto type_hash_v = type_hash<T>::value;


int main ()
 {
   auto x = []{};
   auto y = []{};
   auto z = x;
   std::cout << std::is_same_v<decltype(x), decltype(y)> << std::endl; // 0
   std::cout << std::is_same_v<decltype(x), decltype(z)> << std::endl; // 1
   constexpr auto xhash = type_hash_v<decltype(x)>;
   constexpr auto yhash = type_hash_v<decltype(y)>;
   constexpr auto zhash = type_hash_v<decltype(z)>;
   std::cout << (xhash == yhash) << std::endl; // should be 0
   std::cout << (xhash == zhash) << std::endl; // should be 1
 } // ...........^^^^^  xhash, not yhash

If you really want type_hash as a function, I suppose you could simply create a function that return the type_hash_v<T> of the type received.

Avellaneda answered 24/5, 2019 at 12:43 Comment(5)
I like the idea, but it doesn't seem to work : wandbox.org/permlink/HVAoJWlmj7onyapgMurcia
@MartinMorterol - Thanks for the report: I didn't noticed before. It seems to me that the behavior is correct but the last cout (from the OP) is wrong: the second cout (if we want obtain a 1) should compare xhash and zhash because x and z are from the same type. I've corrected my answer.Avellaneda
xD didn't see that either. Nice solution btw !Murcia
This is a clever approach. std::any implementations tend to use similar ideas to avoid RTTI. It's a pity that a pointer cannot be reinterpret_casted to an integer in constant evaluation.Marble
The extra static variable i can be avoided by initializing value with its own address. e.g.: template<class T> inline constexpr const void* type_hash = &type_hash<T>;(godbolt example)Consideration
A
3

Based on HolyBlackCat answer, a constexpr template variable which is a (naive) implementation of the hash of a type:

template <typename T>
constexpr std::size_t Hash()
{
    std::size_t result{};

#ifdef _MSC_VER
#define F __FUNCSIG__
#else
#define F __PRETTY_FUNCTION__
#endif

    for (const auto &c : F)
        (result ^= c) <<= 1;

    return result;
}

template <typename T>
constexpr std::size_t constexpr_hash = Hash<T>();

Can be used as shown below:

constexpr auto f = constexpr_hash<float>;
constexpr auto i = constexpr_hash<int>;

Check on godbolt that the values are indeed, computed at compile time.

Antimacassar answered 27/5, 2019 at 12:39 Comment(2)
On my 32b arch (microblaze, Xilinx) I had to change all size_t & auto to uint64_t and now I can even do an enum type with those constexpr valuesIntelligencer
doesn't work for unnamed structs gcc.godbolt.org/z/caq4TfrnTDotty
H
2

I will agree with the other answers that it's not generally possible as-stated in standard C++ yet, but we may solve a constrained version of the problem.

Since this is all compile-time programming, we cannot have mutable state, so if you're willing to use a new variable for each state change, then something like this is possible:

  • hash_state1 = hash(type1)
  • hash_state2 = hash(type2, hash_state1)
  • hash_state3 = hash(type3, hash_state2)

Where "hash_state" is really just a unique typelist of all the types we've hashed so far. It can also provide a size_t value as a result of hashing a new type. If a type that we seek to hash is already present in the typelist, we return the index of that type.

This requires quite a bit of boilerplate:

  1. Ensuring types are unique within a typelist: I used @Deduplicator's answer here: https://mcmap.net/q/1007419/-how-to-define-a-variant-lt-x-y-z-gt-extracting-subtypes-of-a-template-parameter
  2. Finding a type in a unique typelist
  3. Using if constexpr to check if a type is in the typelist (C++17)

Live Demo


Part 1: a unique typelist:

Again, all credit to @Deduplicator's answer here on this part. The following code saves compile-time performance by doing lookups on a typelist in O(log N) time thanks to leaning on the implementation of tuple-cat.

The code is written almost frustratingly generically, but the nice part is that it allows you to work with any generic typelist (tuple, variant, something custom).

namespace detail {
    template <template <class...> class TT, template <class...> class UU, class... Us>
    auto pack(UU<Us...>)
    -> std::tuple<TT<Us>...>;

    template <template <class...> class TT, class... Ts>
    auto unpack(std::tuple<TT<Ts>...>)
    -> TT<Ts...>;

    template <std::size_t N, class T>
    using TET = std::tuple_element_t<N, T>;

    template <std::size_t N, class T, std::size_t... Is>
    auto remove_duplicates_pack_first(T, std::index_sequence<Is...>)
    -> std::conditional_t<(... || (N > Is && std::is_same_v<TET<N, T>, TET<Is, T>>)), std::tuple<>, std::tuple<TET<N, T>>>;

    template <template <class...> class TT, class... Ts, std::size_t... Is>
    auto remove_duplicates(std::tuple<TT<Ts>...> t, std::index_sequence<Is...> is)
    -> decltype(std::tuple_cat(remove_duplicates_pack_first<Is>(t, is)...));

    template <template <class...> class TT, class... Ts>
    auto remove_duplicates(TT<Ts...> t)
    -> decltype(unpack<TT>(remove_duplicates<TT>(pack<TT>(t), std::make_index_sequence<sizeof...(Ts)>())));
}

template <class T>
using remove_duplicates_t = decltype(detail::remove_duplicates(std::declval<T>()));

Next, I declare my own custom typelist for using the above code. A pretty straightforward empty struct that most of you have seen before:

template<class...> struct typelist{};

Part 2: our "hash_state"

"hash_state", which I'm calling hash_token:

template<size_t N, class...Ts>
struct hash_token
{
    template<size_t M, class... Us>
    constexpr bool operator ==(const hash_token<M, Us...>&)const{return N == M;}
    constexpr size_t value() const{return N;}
};

Simply encapsulates a size_t for the hash value (which you can also access via the value() function) and a comparator to check if two hash_tokens are identical (because you can have two different type lists but the same hash value. e.g., if you hash int to get a token and then compare that token to one where you've hashed (int, float, char, int)).

Part 3: type_hash function

Finally our type_hash function:

template<class T, size_t N, class... Ts>
constexpr auto type_hash(T, hash_token<N, Ts...>) noexcept
{
    if constexpr(std::is_same_v<remove_duplicates_t<typelist<Ts..., T>>, typelist<Ts...>>)
    {
        return hash_token<detail::index_of<T, Ts...>(), Ts...>{};
    }
    else
    {
        return hash_token<N+1, Ts..., T>{};
    }
}

template<class T>
constexpr auto type_hash(T) noexcept
{
    return hash_token<0, T>{};
}

The first overload is for the generic case; you've already "hashed" a number of types, and you want to hash yet another one. It checks to see if the type you're hashing has already been hashed, and if so, it returns the index of the type in the unique type list.

To accomplish getting the index of a type in a typelist, I used simple template expansion to save some compile time template instantiations (avoiding a recursive lookup):

// find the first index of T in Ts (assuming T is in Ts)
template<class T, class... Ts>
constexpr size_t index_of()
{
    size_t index = 0;
    size_t toReturn = 0;
    using swallow = size_t[];
    (void)swallow{0, (void(std::is_same_v<T, Ts> ? toReturn = index : index), ++index)...};
    
    return toReturn;
}

The second overload of type_hash is for creating an initial hash_token starting at 0.

Usage:

int main()
{
    auto x = []{};
    auto y = []{};
    auto z = x;
    std::cout << std::is_same_v<decltype(x), decltype(y)> << std::endl; // 0
    std::cout << std::is_same_v<decltype(x), decltype(z)> << std::endl; // 1
 
    constexpr auto xtoken = type_hash(x);
    constexpr auto xytoken = type_hash(y, xtoken);
    constexpr auto xyztoken = type_hash(z, xytoken);
    std::cout << (xtoken == xytoken) << std::endl; // 0
    std::cout << (xtoken == xyztoken) << std::endl; // 1
}

Conclusion:

Not really useful in a lot of code, but this may help solve some constrained meta-programming problems.

Hertz answered 29/5, 2019 at 15:51 Comment(0)
K
1

I don't think it is possible. "hash key unique to the type" sounds like you are looking for a perfect hash (no collisions). Even if we ignore that size_t has a finite number of possible values, in general we can't know all the types because of things like shared libraries.

Do you need it to persist between runs? If not, you can set up a registration scheme.

Koumis answered 24/5, 2019 at 23:42 Comment(0)
C
1

I would like to improve upon @max66's answer (which is absolutely brilliant and simple by the way, I wish I thought of it)

<TL;DR>

Put inline on the static variables in max66's answer and it will make sure the same static value is present across all translation units.

</TL;DR>

If you have multiple translation units, then, typically, multiple instances of a static variable (in max66's case i) will be created, changing the value of the hash, this could be a problem, say I have a function getHash that I want to use to check if two types are the same

#include <iostream>
#include <type_traits>
#include <memory>

// <From max66's answer>


template <typename>
struct type_hash
 {
   static constexpr int          i     { };
   static constexpr int const *  value { &i };
 };

template <typename T>
static constexpr auto type_hash_v = type_hash<T>::value;


// </From max66's answer>

struct AnyBase {
    using PointerType = std::unique_ptr<AnyBase>;
    constexpr virtual bool equal(const PointerType& other) const noexcept = 0;
    constexpr virtual const int* typeHash() const noexcept = 0;
};

template<typename ParameterType>
struct Any : public AnyBase
{
    using BasePointerType = std::unique_ptr<AnyBase>;
    using Type = ParameterType;
    Type value;
    constexpr Any(Type value) noexcept : value(value) {}
    constexpr virtual bool equal(const BasePointerType& other) const noexcept final
    {
        if(other->typeHash() == typeHash()) {
            const auto unwrapped = dynamic_cast<const Any<Type>*>(other.get()); 
            return unwrapped->value == value;
        }
        return false;
    }
    constexpr virtual const int* typeHash() const noexcept final {
        return type_hash_v<Type>;
    }

};

using AnyType = std::unique_ptr<AnyBase>;

template<typename ParameterType>
AnyType makeAny(auto... initializationValues) {
    return static_cast<AnyType>(std::make_unique<Any<ParameterType>>(initializationValues...));
}

int main()
{
    AnyType any0 = makeAny<int>(4);
    AnyType any1 = makeAny<int>(2);
    AnyType any2 = makeAny<int>(4);
    AnyType any3 = makeAny<char>('A');
    std::cout << "Hash Codes: " 
            << any0->typeHash() << "\n"
            << any1->typeHash() << "\n"
            << any2->typeHash() << "\n"
            << any3->typeHash() << "\n";
    std::cout 
            << "any0 == any1? " << any0->equal(any1) << "\n" // False within translation unit
            << "any0 == any2? " << any0->equal(any2) << "\n" // True within translation unit
            << "any0 == any3? " << any0->equal(any3) << "\n"; // False within translation unit
    return 0;
}

If I instantiate two Any<int>'s in two different translation units, they may have two different hashes, because i is static and is likely to have a different addresses across the translation units, therefore when I try to compare Any<int>'s it will fail even if they have the same type and value.

I learned from Daisy Hollman's presentation at CppNow (slides here) that the C++ standard guarantees a single persistent instantiation of an object across translation units

Unfortunately the way she does type registration at the beginning of the presentation is not constexprable because it relies on a static mutable variable within a function. However using this knowledge we can tweak max66's approach, lets modify the code from before

#include <iostream>
#include <type_traits>
#include <memory>
#include <bit>

template<typename ParameterType>
struct Registrar
{
    constexpr inline static const uintptr_t getHash() { // ACCORDING TOO C++ STANDARD INLINE GUARANTEES ONE COPY ACROSS ALL TRANSLATION UNITS
        return std::bit_cast<uintptr_t>(&hashObject);
    }
    protected: 
        constinit inline static const size_t hashObject = 0; // ACCORDING TOO C++ STANDARD INLINE GUARANTEES ONE COPY ACROSS ALL TRANSLATION UNITS
};




struct AnyBase {
    using PointerType = std::unique_ptr<AnyBase>;
    constexpr virtual bool equal(const PointerType& other) const noexcept = 0;
    constexpr virtual const uintptr_t typeHash() const noexcept = 0;
};

template<typename ParameterType>
struct Any : public AnyBase
{
    using BasePointerType = std::unique_ptr<AnyBase>;
    using Type = ParameterType;
    Type value;
    constexpr Any(Type value) noexcept : value(value) {}
    constexpr virtual bool equal(const BasePointerType& other) const noexcept final
    {
        if(other->typeHash() == typeHash()) {
            const auto unwrapped = dynamic_cast<const Any<Type>*>(other.get()); 
            return unwrapped->value == value;
        }
        return false;
    }
    constexpr virtual const uintptr_t typeHash() const noexcept final {
        return Registrar<Type>::getHash();
    }

};

using AnyType = std::unique_ptr<AnyBase>;

template<typename ParameterType>
AnyType makeAny(auto... initializationValues) {
    return static_cast<AnyType>(std::make_unique<Any<ParameterType>>(initializationValues...));
}

int main()
{
    AnyType any0 = makeAny<int>(4);
    AnyType any1 = makeAny<int>(2);
    AnyType any2 = makeAny<int>(4);
    AnyType any3 = makeAny<char>('A');
    std::cout << "Hash Codes: " 
            << any0->typeHash() << "\n"
            << any1->typeHash() << "\n"
            << any2->typeHash() << "\n"
            << any3->typeHash() << "\n";
    std::cout 
            << "any0 == any1? " << any0->equal(any1) << "\n" // False GUARANTEED across translation units
            << "any0 == any2? " << any0->equal(any2) << "\n" // True GUARANTEED across translation units
            << "any0 == any3? " << any0->equal(any3) << "\n"; // False GUARANTEED across translation units
    return 0;
}

Now our hash is guaranteed across translation units (as I stated in all caps :) )

Thanks to @max66 and Daisy Hollman

And a note, I think you could further static_cast to size_t or something if you want from uintptr_t, both examples compile with gcc 12.2 with -std=c++23

Checkbook answered 23/10, 2022 at 15:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.