Does std::hash guarantee equal hashes for "equal" floating point numbers?

Asked 18/2, 2013 at 19:25 Answered 12/12, 2014 at 4:52

Solved c++c++11 hash floating-accuracy stdhash

Is the floating point specialisation of std::hash (say, for doubles or floats) reliable regarding almost-equality? That is, if two values (such as (1./std::sqrt(5.)/std::sqrt(5.)) and .2) should compare equal but will not do so with the == operator, how will std::hash behave?

So, can I rely on a double as an std::unordered_map key to work as expected?

I have seen "Hashing floating point values" but that asks about boost; I'm asking about the C++11 guarantees.

Toluene answered 18/2, 2013 at 19:25 Comment(9)

"should compare equal" wut? – Arboreal 18/2, 2013 at 19:27

They had better have different hashes, otherwise the hashing is broken. – Oestradiol 18/2, 2013 at 19:27

An answer to this question says it's implementation dependent: #7758893 – Remotion 18/2, 2013 at 19:28

@DavidSchwartz Um, unequal objects having the same hash value is pretty much unavoidable, and the source of 99% of what goes into a hash table. – Hampshire 18/2, 2013 at 19:32

@delnan: Of course. But it's critical that it not be possible for a simple, common algorithm to trivially produce large numbers of values with the same hash. (Except, of course, one maliciously constructed specifically to collide.) It's trivial and typical to produce "almost equal" floats and want to store them in the same collection. – Oestradiol 18/2, 2013 at 19:34

@Toluene The reason that your two expressions don't compare as equal is NOT that there are two binary expressions of 0.2, but that there is NO exact (finite) binary representation of 0.2, or sqrt(5) ! – Solidago 18/2, 2013 at 19:37

The requirement is that the same value produce the same hash value. A good hash function produces different hash values for different values to the extent possible. And, no, 1./sqrt(5.)/sqrt(5.) is not necessarily equal to 0.2, and if it's not, hashing it is not required to produce the same value as hashing 0.2. – Cymene 18/2, 2013 at 19:38

@DavidSchwartz I wouldn't expect a large number of "almost equal" floats in most cases, at least for reasonably small epsilon (just cutting off after is likely awful, I agree, but cutting off after the sixth decimal digit or something may be perfectly okay). – Hampshire 18/2, 2013 at 19:41

@delnan: It's not a matter of "most cases", it's a matter of every reasonable case. A large number of almost equal floats is a reasonable case. In fact, it's one of the most common test cases for hash functions. – Oestradiol 18/2, 2013 at 19:57

std::hash has same guarantees for all types over which it can be instantiated: if two objects are equal, their hash codes will be equal. Otherwise, there's a very large probability that they won't. So you can rely on a double as a key in an unordered_map to work as expected: if two doubles are not equal (as defined by ==), they will probably have a different hash (and even if they don't, they're different keys, because unordered_map also checks for equality).

Obviously, if your values are the results of inexact calculations, they aren't appropriate keys for unordered_map (nor perhaps for any map).

Askew answered 18/2, 2013 at 19:42 Comment(4)

Ah, your point about unordered_map checking for strict equality is a very good one (I neglected that). So even if the std::hash would behave as I intended, I wouldn't have gained anything as == would "break" it again. Thanks. – Toluene 18/2, 2013 at 19:44

Can you give an example number that would be a problematic key that could actually occur? Which combination of bits of a double is ambiguous? NaN's with unusual mantissa bit patterns? This answer reinforces fear of floating point, doesn't it? – Thorner 27/7, 2019 at 18:21

@Thorner I gave one in the question. – Toluene 7/5, 2020 at 17:24

@Toluene If you used integers between -(2^53) to +(2^53) for the key it will work perfectly, double can represent that entire range exactly. It can represent more than that exactly, anything with about 50 significant binary digits will be perfect. I don't see a problem with two differently rounded calculations being not equal in your question, what did you expect? 1/3 can't be written down in base ten either (0.33333....) that doesn't mean everything is wrong. I can write 1/3 exactly in base 3 though: 0.1. Same issue between base 10 and base 2. – Thorner 8/5, 2020 at 9:52

Multiple problems with this question:

The reason that your two expressions don't compare as equal is NOT that there are two binary expressions of 0.2, but that there is NO exact (finite) binary representation of 0.2, or sqrt(5) ! So in fact, while (1./std::sqrt(5.)/std::sqrt(5.)) and .2 should be the same algebraically, they may well not be the same in computer-precision arithmetic. (They aren't even in pen-on-paper arithmetic with finite precision. Say you are working with 10 digits after the decimal point. Write out sqrt(5) with 10 digits and calculate your first expression. It will not be .2.)
Of course you have a sensible concept of two numbers being close. In fact you have at least two: One absolute (|a-b| < eps) , one relative. But that doesn't translate into sensible hashes. If you want all numbers within eps of each other to have the same hash, then 1, 1+eps, 1+2*eps, ... would all have the same hash and therefore, ALL numbers would have the same hash. That is a valid, but useless hash function. But it is the only one that satisfies your requirement of mapping nearby values to the same hash!

Solidago answered 18/2, 2013 at 19:43 Comment(3)

You have convinced me: every double should hash to 7. – Arboreal 18/2, 2013 at 20:12

I am under the impression that the author is aware of these things, and the answer is only around-, not on-topic. – Fatsoluble 18/2, 2013 at 20:36

@Fatsoluble That's for the author to say. Reading his post and comments, I was under the impression that these are points that required elaborting on. Both you and the author are free to downvote my answer if you deem it irrelevant. – Solidago 18/2, 2013 at 20:37

Behind the default hashing of an unordered_map there is a std::hash struct which provides the operator() to compute the hash of a given value.

A set of default specializations of this templates is available, including std::hash<float> and std::hash<double>.

On my machine (LLVM+clang) these are defined as

template <>
struct hash<float> : public __scalar_hash<float>
{
    size_t operator()(float __v) const _NOEXCEPT
    {
        // -0.0 and 0.0 should return same hash
       if (__v == 0)
           return 0;
        return __scalar_hash<float>::operator()(__v);
    }
};

where __scalar_hash is defined as:

template <class _Tp>
struct __scalar_hash<_Tp, 0> : public unary_function<_Tp, size_t>
{
    size_t operator()(_Tp __v) const _NOEXCEPT
    {
        union
        {
            _Tp    __t;
            size_t __a;
        } __u;
        __u.__a = 0;
        __u.__t = __v;
        return __u.__a;
    }
};

Where basically the hash is built by setting a value of an union to the source value and then getting just a piece which is large as a size_t.

So you get some padding or you get your value truncated, but that doesn't really matter because as you can see the raw bits of the number are used to compute the hash, this means that it works exactly as the == operator. Two floating numbers, to have the same hash (excluding collision given by truncation), must be the same value.

Tweak answered 12/12, 2014 at 4:52 Comment(0)

There is no rigorous concept of "almost equality". So behavior can't be guaranteed in principle. If you want to define your own concept of "almost equal" and construct a hash function such that two "almost equal" floats have the same hash, you can. But then it will only be true for your particular notion of "almost equal" floats.

Oestradiol answered 18/2, 2013 at 19:29 Comment(14)

I'd say that it's a mathematical impossibility to define a sensible notion of "almost equal" that will allow a well-behaved hash function respecting that notion. – Solidago 18/2, 2013 at 19:31

It's hard to imagine a concept of "almost equal" for which this actually could work, unless we just use the same hash for all values. I mean, if 1.00000 is "almost equal" to 1.00001 is "almost equal" to 1.00002 and so on, then any two numbers will be linked by a chain of "almost equal" pairs. – Ultra 18/2, 2013 at 19:32

Well, there are different representation of the exact same value in floating point numbers, 0 is one example, 0.2 is another (see OP). So I would expect these to compare equal, if a hash function is provided. Otherwise the implementation basically performs a bit by bit comparison, which is useless for floating points. – Toluene 18/2, 2013 at 19:34

@Toluene - no, there is only one representation for 0.0, a different representation for -0.0, and only one representation for 0.2. What the OP is talking about is a computation that, because of finite precision, ends up with a value that is not 0.2, even though algebraically it would be. – Cymene 18/2, 2013 at 19:35

@ruakh: Yeah, no. You can compare values with a granularity of epsilon to decide if they are equal or not. As far as I can tell, that would separate 1.00000 and 1.00001 even transitively. – Toluene 18/2, 2013 at 19:36

@PeteBecker: For starters, I am the OP, so I should better know what I'm asking :) Second, -0 and 0 are equal values, but as you say they don't compare equal. As does 0.2 and that fraction. – Toluene 18/2, 2013 at 19:38

@Toluene - -0 and 0 are "equal values" for some definitions of "equal" and not for others. 1.0/-0.0 is negative infinity; 1.0/0.0 is positive infinity. This is a pretty good indication that they are not "equal" (for some definitions of "equal"). And 0.2 and that fraction are not necessarily the same value, because the calculation of that fraction uses truncated values. It's just like int a = 1/3; int b = a * 3; not setting b to 1, but a little less obvious. – Cymene 18/2, 2013 at 19:42

@bitmask: No, that would not separate 1.00000 and 1.00001 transitively. Think about it -- at each stage going from 1.00000 to 1.00001, you're allowed to add at least the full value of epsilon. You'll never hit, say, an impassible barrier that you can only approach asymptotically. – Ultra 18/2, 2013 at 19:43

@Solidago It's actually not very difficult. Just partition the set of all values in some way, convert each value to a normalized (representative) value of the partition its in, and use this normalized value for hashing and comparison. But you have to partition; you cannot use +/-epsilon. – Askew 18/2, 2013 at 19:44

@Toluene 0.0 and -0.0 do compare equal. And the results of your expression are not guaranteed to be 0.2---for that matter, on most machines, they are guaranteed not to be 0.2, because 0.2 isn't representable. – Askew 18/2, 2013 at 19:46

@James I am aware of that. But once you partition your range, you get useless behaviour at the boundaries. Two values on different sides of a boundary will then have different hashes, no matter how close they are. That defeats the entire point of the construction. – Solidago 18/2, 2013 at 19:47

@Solidago Whether you get useless behavior at the boundaries depends on what you're doing (or have done) with the values. If the operations are such that you should come out with an integral value, for example, partitioning can make a lot of sense. – Askew 18/2, 2013 at 19:48

@PeteBecker: My point was that double is an approximation of the real numbers. And you can very easily show that the additive inverse of zero is zero, hence 0 == -0. – Toluene 18/2, 2013 at 19:49

@Toluene - analyzing real numbers is a very bad way of understanding how double works precisely because double is an approximation of real numbers. Yes, 0.0 == -0.0, because that's convenient. But they have different representations, and can give different results when plugged into the same calculation. – Cymene 18/2, 2013 at 19:53

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags