C++11: Are there reasons why some Regular Types should not have `std::hash` specialised?
Asked Answered
E

2

16

With a Regular Type, I mean the definition of Stepanov in Elements of Programming, basically, that there's the concept of equality and that objects which are copies of each other compare equal.

So when you have a Regular Type T, and the equality relation is transitive (a == b && b == c => a == c), you can define a (non-trivial) hash function which is consistent with the definition of equality (a == b => h(a) == h(b)). Always.

But the standard doesn't include many std::hash specialisations. E.g. std::complex doesn't have one, and neither have the containers, with the notable exceptions of vector<bool> and bitset.

So I'm wondering what the design principle is here.

Or, asked differently: Are there reasons not to provide std::hash specialisations for your own types, provided they are regular and equality is transitive?

Eastward answered 30/4, 2015 at 13:40 Comment(7)
Sure, you can always define a hash function which is consistent: size_t operator()(T& const) const { return 0; } The question is, can you always define one that is good for an arbitrary type?Unscrew
vector<bool> is not implemented as vector of bools. It's a template specialization that relies on int to save several bools (I assume 32). It's invariant to any template parameters (and the underlying types' std::hash), I think that's why a specialization is provided.Celloidin
There are a number of types in the standard library that have standing requests open to specialize std::hashBronchoscope
"So when you have a Regular Type T, and the equality relation is transitive [...]" This sounds like a lemma. Care to elaborate? I've read EoP, but I can't remember anything in it that proves it.Whim
@dyp: No hashing in EoP, but if equality is an equivalence relation (which includes transitivity), then a hash function can be constructed that maps each equivalence class into a natural number. QED. But you don't actually need that equality is an equivalence relation. E.g. equality for floating-point types is not reflexive (nan != nan), yet a consistent hash function exists (auto h(auto x) { return x == 0 ? 0 : hash_bits(&x, sizeof x); }). But when you drop transitivity, then you cannot define a consistent hash function (at least in the cases I saw, e.g. a == b <=> abs(a-b) < 1e-6).Eastward
I don't quite see why the set of equivalence classes should be countable. For a finite set of values, this is obvious, but I don't think that's required for regular types. Maybe the countability of the set of values is a property of the representation? Additionally, even if it is possible to enumerate the equivalence classes, it might be impossible to find them. For example, if you write a wrapper type that wraps some opaque API which only exposes certain functions (the regular computational basis) for opaque handle types.Whim
The set of equivalence classes is finite, thus countable, because your RAM is finite. EoP, not EoinfiniteVectorSpaces. As for proxies: sure, if you want to hash proxies, you need to put a hash function into its computational basis. But the same is true for equality, and I assume equality exists (regular type). As a constructive approach: if you hash the subobjects which you compare for equality in the equality operation, possibly recursively, and in the same order (to handle unordered_set), then you get a consistent hash function.Eastward
S
2

Yes.

When a type has the following two properties I do not think you should define std::hash:

  • There is no efficient way to consistently create a quality hash that covers all the data used to describe equality.

  • There is no efficient and/or intuitive way to select a consistent subset of data to hash.

Scanties answered 30/4, 2015 at 14:13 Comment(2)
Actually, std::hash<std::unordered_*> can be implemented in linear time, whereas operator== has worst-cast quadratic time. Just hash individual items and use a commutative hash combiner (unsigned addition, not xor).Eastward
@MarcMutz-mmutz Never mind my first comment - if the hash function creates uniform values addition is fine. I'll see if I can find a better example type - I think my properties are fine though.Scanties
C
0

Providing specialization for own types makes no sense, if they're template classes, since the hash function (with high a very high probability) also depends on the template parameters' types, which are unknown.

It there are no template parameters, or the template parameters are restricted to certain types

// provide no implementation to restrict all types
template<typename T> container;

// int is allowed
template<> container<int> {
    ...
}

// double is allowed
template<> container<double> {
    ...
}

providing a specialization of std::hash is possible, since the implementations of the classes (or the instanced template classes) are known, as it is for vector<bool> contrary to complex<T>.

Celloidin answered 30/4, 2015 at 14:8 Comment(5)
What's wrong with namespace std { template <typename T> struct hash<complex<T>> { using result_type = size_t; using argument_type = complex<T>; result_type operator()(const argument_type &a) const { std::hash<T> hash; size_t result = hash(a.real()) + 0x9e3779b9; result ^= hash(a.imag()) + 0x9e3779b9 + (result << 6) + (result >> 2); return result; } }; }?Eastward
the line std::hash<T> hash; requires that there is a std::hash specialization for T, it uses this specialization to create a specialization for complex<T>, but how can one know that this would yield an effective implementation?Celloidin
complex is only allowed to be instantiated with float, double and long double. Even if it wasn't, failure to provide a hash<Custom> when requesting instantiation of hash<complex<Custom>> would just yield the expected error. No problem there. And the combiner (with the magic number) is a pretty good one, from Boost, regardless of the types involved.Eastward
Sry, I rather meant efficient. Of course you're proposal is effective, at least as effective as hash<Custom>.Celloidin
Why would a hash<complex<Custom>> be at least as efficient as hash<Custom>? That makes no sense. It should be roughly 2x slower, a bit more to allow for the combining operation, a bit less to allow for instruction-level parallelism. Likewise, whether I define my hash<complex<T>> generically or whether I define three hash<complex<FP>>, FP in {float, double, long double}, there should be no difference in efficiency, should there?Eastward

© 2022 - 2024 — McMap. All rights reserved.