Defining a c++20 concept for hash functions
Asked Answered
L

2

8

The CppCoreGuidelines state that concepts should be specified for all template arguments. (see:T.10: Specify concepts for all template arguments) As practice for defining concepts, I am trying to build a hashtable with concepts defined for the Hash function and Key template arguments.

I want my hashtable to use two template arguments, HashFunc and Key. HashFunc should be a function object and Key should be the argument for the function object HashFunc.

That is, HashFunc(Key) should return a type convertible to size_t.

On cppreference, there is an example defining the concept Hashable. I replicated the example below:

template<typename T>
concept Hashable = requires(T a) {
    { std::hash<T>{}(a) } -> std::convertible_to<std::size_t>;
};

This Hashable concept makes sense for many uses. In these cases, hash functions on object of type T are specializations std::hash<T>. However, for my purposes, I don't want to assume that the Hash will be std::hash<Key>. I would like the user to be able to provide a different hash function.

Since HashFunc and Key are so tightly bound, I don't think I can define separate concepts for HashFunc and Key. Is that correct? So I would like to define a concept HashConcept that deals with HashFunc and Key simultaneously.

So I define one concept Hash that deals with both. I try my best to define the concept so that it matches the named requirement for Hash here. The goal then is to satisfy 4 conditions. Below this list, I talk about trying to enforce these conditions.

  1. The return type is convertible to std::size_t.
  2. The hash function displays equality preservation (h(k1) == h(k1) for the duration of the program. see C++ Extensions for Ranges section 19.1.1)
  3. If u is an lvalue Key, then h(u) does not modify u.
  4. "The probability of h(a)==h(b) for a!=b should approach 1.0/std::numeric_limits<std::size_t>::max()."

Does this list appear complete?

I don't believe concepts can enforce (4), and (4) will just need to be indicated in comments/documentation. I believe that concepts might be able to enforce (2) and (3), but I'm not sure how. C++ Extensions for Ranges section 19.5 defines the concepts Callable and RegularCallable, then says "Note: The distinction between Callable and RegularCallable is purely semantic. — end note", suggesting that (2) cannot be enforced. That leaves (1) and (3).

I define a concept that enforces (1).

template<typename HashFunc, typename Key>
concept Hash = requires(HashFunc h, Key k) {
    { std::invoke(h, k) } -> std::convertible_to<std::size_t>;
};

Is this concept correct? (e.g., should I have used requires or returned a bool?) Can my concept be extended to address other requirements for hash functions, such as (2)-(4)?

Below is some example code that uses the concept. The result is to print 3 to the std::cout.

#include <functional>
#include <concepts>
#include <iostream>

template<typename HashFunc, typename Key>
concept HashConcept = requires(HashFunc h, Key k) {
    { std::invoke(h, k) } -> std::convertible_to<std::size_t>;
};


class HashFunc {
public:
    std::size_t operator()(int i) {
        return static_cast<size_t>(i);
    }
};

template<typename Hash, typename Key>
    requires HashConcept<Hash, Key>
size_t HashConceptUser(Hash h, Key k) {
    return h(k);
}

int main() {
    std::cout << HashConceptUser< HashFunc, int >(HashFunc{}, 3); 

}
Louden answered 3/12, 2020 at 14:29 Comment(13)
HashFunc's call operator is private.Siphonophore
@Siphonophore Thanks, I have edited the question to remove the error to prevent confusion.Louden
Not reproductible since your call operator is public: godbolt.org/z/f777cPCurassow
The (3) could be enforceable by taking const Key& k instead of Key k. If someone breaks this precondition (by legal or illegal const_cast) then, well, I guess it is almost (or actual) UB on their part.Rubbico
There is no reason to explicitly provide template parameters. It's a function template, let deduction do its thing.Siphonophore
explicitly providing template parameters 1. makes it easier to find errors in code and 2. documents for others what my code should be doing.Louden
@Rubbico take const Key& k where? In the HashConceptUser by size_t HashConceptUser(Hash h, const Key& k)?Louden
@Louden I disagree strongly with both statements. It certainly makes it easier to introduce errors by getting the types wrong.Siphonophore
if I get the types wrong, then that means I probably didn't understand what my code is doing. If I get a compiler error, then I am happy.Louden
@Louden requires(HashFunc h, Key k) -> requires(HashFunc h, const Key& k). This will either ensure that HashFunc takes its parameter by const& or by value - either way, it cannot (unless const_cast, but if the programmer really wants to break something, they will) change the passed argument.Rubbico
@Rubbico I'm confused by how cv-qualification works with requires, but your suggestion appears to work. I made the change you suggested and changed the operator() function to std::size_t operator()(int& i) { i += 1; return static_cast<size_t>(i); }. The compiler caught the error as you said.Louden
@Louden requires is "given" the arguments and tests some conditions given those exact arguments. When given just the const&, it will, well, require those tests to pass when given const&.Rubbico
""The probability of h(a)==h(b) for a!=b should approach 1.0 / std::numeric_limits<std::size_t>::max()" - this may be impractical to enforce with concepts, but anyway - your identity hash has probability 0, and the desirable quality is really that the probability after % bucket_count() - for any possible bucket_count() - doesn't significantly exceed 1.0/table_size. That can vary with implementation (e.g. GCC standard hash containers use prime bucket counts, Visual C++ uses powers of 2).Snowinsummer
S
7

Does this list appear complete?

The list is missing arguably the single most important criteria for a hash function: that if a == b then h(a) == h(b).

The 4th criterion on the list is something you want for good hash functions, and is itself somewhat incomplete - you don't just want the likelihood of collision to be small, you also want random dispersion. The hash function h(i) = i satisfies the 4th criterion, but not a good hash function. On the flip side, h(i) = 0 is a terrible hash function but should be considered valid.


That said, C++ language concepts cannot enforce any of these things - you cannot enforce that the hash function is equality-preserving, you cannot enforce that it doesn't modify its inputs, and you cannot enforce anything about the distribution of its results. Those are what we would call semantic constraints rather than syntactic ones (the C++ standard speaks of satsifying a concept if the syntactic constraints are met and modeling a concept if the syntactic and semantic ones are met). The semantic constraints are documented requirements (in comments, or just documentation) rather than coded ones.

The best you can do the syntax is just verify that the hash function is invocable and gives you an integer:

template <typename F, typename T>
concept HashFor = std::regular_invocable<F, T>
               && std::convertible_to<std::invoke_result_t<F, T>, size_t>;

I am using regular_invocable here because that concept adds semantic constraints that you want: that the function call is equality-preserving and does not modify the function object or its arguments. You could also write it this way:

template <typename F, typename T>
concept HashFor = std::regular_invocable<F, T>
    && requires(F f, T t) {
        { std::invoke(f, t) } -> std::convertible_to<size_t>;
    };

But I would keep the regular_invocable part.

Siphonophore answered 3/12, 2020 at 15:24 Comment(0)
K
0

You're right that C++20 Concepts can't deal with semantic constraints, only syntactic constraints. So your criteria (2) and (4) are right out. (And Barry is right that your (2) was way too weak; the correct stronger criterion, h(i)==h(j) if i==j, is also right out.) So really your question is, "What is syntactically required of a hasher in C++?"

As Eric Niebler once said, you don't find concepts by looking at types (the implementors of concepts), you find concepts by looking at algorithms (the users of concepts). So this question boils down to, "What does unordered_set syntactically require of its hasher?"

  • H must be copyable, assignable, etc., or else your unordered_set<K, H> won't be copyable, assignable, etc. (Arguably a move-only hasher could produce a move-only unordered_set, but in practice that's not how it works. E.g. s.hash_function() requires copyability.)
  • H must be default-constructible, or else your unordered_set won't be default-constructible.
  • H must be both callable and const-callable, with both const and non-const lvalues and rvalues of type K. (If H isn't const-callable, you won't be able to look anything up in a const unordered_set.)

(As of 2023, libc++ encodes some of these requirements in __check_hash_requirements. You might look there for inspiration.)

So you might make a concept something like

template<class F, class... Args>
concept SizeTPredicate =
  std::regular_invocable<F, Args...> &&
  std::convertible_to<std::invoke_result_t<F, Args...>, size_t>;

template<class H, class K>
concept HashFor =
  std::semiregular<H> &&
  SizeTPredicate<H&, K&> &&
  SizeTPredicate<H&, const K&> &&
  SizeTPredicate<const H&, K&> &&
  SizeTPredicate<const H&, const K&>;

However, in practical terms, the four calls to SizeTPredicate are both overkill and arguably insufficient — arguably we might also want to encode that H& is callable with K&&, for example, or that H&& is callable with const K& (e.g. s.hash_function()(*s.begin()) should be well-formed). I say "arguably" because here we're drifting away from what the algorithm actually requires and into "stuff we just thought up." That is usually a very bad idea when it comes to concept design.

So maybe we should mentally classify "H& must be callable" and "H must be callable with K& as well as const K&" as basically semantic constraints — we'll pick the one syntactic constraint that best expresses our intent and catches the most user error, and leave the corner cases up to the user's judgment. The result:

template<class H, class K>
concept HashFor =
  std::semiregular<H> &&
  std::regular_invocable<const H&, const K&> &&
  std::convertible_to<std::invoke_result_t<const H&, const K&>, size_t>;

Finally, you might wonder whether we can just omit std::regular_invocable<X,Y> from the concept definition — isn't that mechanically implied by our use of std::invoke_result_t<X,Y>? (That is, if X doesn't satisfy invocable<Y>, then invoke_result_t<X,Y> will SFINAE away, correctly making our whole concept unsatisfied.) There are two reasons, neither of which is a slam dunk:

  • Putting invocable right in the constraint, at the top level, allows subsumption to work. Godbolt:

      template<class T> requires HashFor<T, int>
      int f(T) { return 1; }  // "more constrained"
    
      template<class T> requires std::invocable<const T&, const int&>
      int f(T) { return 2; }  // "less constrained"
    
  • Using std::regular_invocable instead of std::invocable (or nothing) communicates to the human reader that we intend this thing to be regularly invocable, i.e., it shouldn't modify its arguments or be nondeterministic; it should be a pure function of its inputs.

Of course the former code is highly unrealistic, and we could do the latter with a // code comment instead.


I feel like there's one more missing semantic criterion, to do with heterogeneous/transparent hashers. Right now we have

static_assert(HashFor<std::hash<std::string_view>, std::string>>);

which is fine and good — the STL even goes out of its way to guarantee that std::hash<std::string_view>()(s) == std::hash<std::string>()(s) for all strings s. But we also have

static_assert(HashFor<std::hash<std::string_view>, const char*>>);

even though std::hash<std::string_view>()(p) != std::hash<const char*>()(p) for most pointers p, and even though std::hash<std::string_view>()(nullptr) has undefined behavior. This feels like a problem. But, at the same time, we obviously do want to permit people to create unordered_set<const char*, hash<string_view>, equal_to<string_view>> — there's nothing catastrophic about that! So I think this is simply a semantic constraint on H: ordinary usage of H shouldn't result in UB or otherwise do anything "unexpected" by the user. All we can do with our concept is encode what usages we expect; we can't encode exactly what we mean by "unexpected behavior," except in ordinary language via comments and documentation. This is exactly analogous to how

static_assert(std::totally_ordered<float>);

is true, even though float(std::nan("")) isn't actually part of that "total" order. Likewise, we'll let

static_assert(HashFor<std::hash<std::string_view>, const char*>>);

be true, even though std::hash<std::string_view> can't actually hash (const char*)nullptr. As long as the user plays along and excludes the special value from their domain, we'll be okay.

Kittrell answered 24/9, 2023 at 14:37 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.