Good way to hash a float vector?
Asked Answered
O

8

31

I am well aware of all the problems involved in comparing floats. This is exactly the reason for this question.
I'm looking to create a fast hash table for values that are 3D vectors (3 floats - x,y,z). It can be assumed that the length of the vector is always 1.0 (sqrt(x*x+y*y+z*z) is 1.0)

Essentially this means that I'm looking for a hash function that takes values that are almost equal to the same unsigned int value and a corresponding equality operator that is true if the hash values are equal (not not necessarily only if they are equal)

Edit -
False positives (i.e. vectors that are different but map to the same bucket) are a given since this is a hash table.
False negatives (i.e. Vectors that are close but map to different buckets) are undesirable but it seems that there is no way to avoid them. In my case, they will not causes total breakage, just some data duplication which is something I'll have to live with.

Ostrom answered 16/3, 2009 at 12:18 Comment(3)
What an interesting question!Renfrow
Have you considered using one or more of the following general purpose hash functions: partow.net/programming/hashfunctions/index.html they are extremely fast and efficient.Dongdonga
Related: How do I find hash value of a 3D vector?Tenebrous
R
16

I think what you're looking for is not directly possible. One important property of equality is that it is transitive. (ie. If a == b and b == c, then a == c). With a distance measure, though, you really don't want this property. Example:

Take a single float (for simplicity). Suppose we want to hash each float so that floats less than 1e-3 away are the same value. Now, suppose we add to this hash table 1000 float values all 1e-4 apart. Any neighboring 2 values should hash to the same float, since they are closer than 1e-3. However, because of transitivity, the neighbors of those values should also has to the same value, and their neighbors and so on. As a result, all 1000 values, including pairs farther than 1e-3 apart, would hash to the same integer. If you were to draw these points on a picture:

A  B  C  D  E  F  G  H ... Y Z

Suppose all the gaps are < 1e-3 apart, but A and Z are > 1e-3 apart (not to scale!). This can't be satisfied because if hash(A) == hash(B) and hash(B) == hash(C) and so on for all pairs, (since they are < 1e-3 apart) than hash(A) must == hash(Z).

One possible option is to define regions of your vector space in which all vectors would hash to the same value (ie round them before hashing them), but you could still get 2 vectors on the edges of their respective spaces that are close together but hash to a different value. You could get around that by searching all neighboring spaces for a vector. (ie in the 1-d case above, you'd round all vectors to the nearest multiple of 1e-3, and then search the neighbors, so 5.3e-3 would search 5e-3, 4e-3 and 6-e3. In higher-dimensional cases, you'd have to search neighbors in all dimensions.)

Renfrow answered 16/3, 2009 at 12:28 Comment(2)
Related: Hash function for floatsTenebrous
Solution: Hash everything to the same value. Transitivity guaranteed!Felty
C
3

I'd convert the float values into integers like this:

unsigned int IntValue = (int)(floatValue * MULT) + MULT;

so you get some of the first digits and then use

const MULT1 = (MULT << 1) + 1;
unsigned long long HashValue = (xIntValue * MULT1  * MULT1) + (yIntValue * MULT1) + zIntValue;

as a hash value (using (MULT * 2) + 1 because the IntValues will be between 0 and MULT * 2 inclusive).

The memory needed will be depending on the multiplicator MULT. For example, using 32 you'll get a hashtable using 64 * 64 * 64 * (Hash item size) = 262144 * (Hash item size) bytes.

Critical answered 16/3, 2009 at 12:25 Comment(4)
Using this scheme, you'd still get vectors that are very very close together but hash to different buckets - right at the edge of the rounding you're doing to compute IntValue.Renfrow
Of course, but I think the OP wants a quick way to compare vectors, not an exact way, or am I wrong?Critical
I think (and please correct me, if I'm wrong), that he wants a way that will only have false positives (saying they're equal when they're far away), not false negatives (saying they're different when they're close). At least, it sounds that way because he wants a hash table.Renfrow
Ah, now I get your point. You can't specify some "minimum same bucket distance" for my method (f.e. say all vectors that only differ by 1e-5 get into the same bucket).Critical
C
3

Some languages (C, Java 5) allow you to access the binary value of your floats. This way, you can extract the first N bits of the mantissa (ignoring the last few bits which cause the trouble during compare) and calculate the hash from that.

Crayton answered 16/3, 2009 at 12:32 Comment(0)
C
3

I think you are effectively trying to solving the K nearest problem. I believe what you are looking for is locality sensitive hashing. Also you can use quad tree structures to achieve the same result.

Calmas answered 31/5, 2012 at 15:8 Comment(0)
K
1

Can you ellaborate on your problem?

Assuming that you are using a hashmap to map some additional data to specific vectors, you could just use the XOR of the binary representations of the components (if this is possible in your language of choice). Then use as many LSBs (to reduce collisions) as you need for the hash map. This would of course have the property that two equal (by floating point comparison) vectors might not have the same hash (e.g. IEEE floating point 0 equals to -0, but they have different sign bit).

However, if you are planning on using vectors that are results of different computations to do hash lookup, you are setting yourself up to the possibility of not having matching hash codes due to rounding errors and you should probably be using something else anyway.

Katusha answered 16/3, 2009 at 12:49 Comment(0)
B
1

Yes, it is possible. I have written an article how to hash float vectors and its implementation in Golang (Github link).

The method works by quantization of floats to integers, BUT it makes sure that values near or at quantization borders are also comparable. To do that, several hashes may be generated for one value. Then hash-tables can be used. Hashes in the method are hypercube numbers in n-dimensional space.

Since comparison of two real numbers assumes an equality-radius (else they would be incomparable unless precisely equal), quantization is applicable to solve such problem.

Bealle answered 7/12, 2021 at 21:22 Comment(0)
D
0

don't know how fast this might be, but since you have unit vectors, they all lie on the surface of a sphere. convert to a http://en.wikipedia.org/wiki/Spherical_coordinate_system. then use phi and theta to choose a bucket. there will be no false positives. you can look in the neighboring cells for false negatives.

Deicide answered 2/7, 2010 at 8:18 Comment(1)
Performing the conversion will introduce more rounding error. This might lead to some vectors ending up in the wrong bucket, depending on the bucket size.Commissariat
B
0

Do you need it to be a fast hash table or would a tree structure do?

It seems to me that it would be easier to look up matching floats in a search tree of some sort. A B-Tree minimizes the number of cache misses, assuming that you choose the right node size. That ought to make it pretty fast in practice.

Booklover answered 2/7, 2010 at 8:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.