Java hashCode for a Point class
Asked Answered
G

8

10

I have a simple custom Point class as follows and I would like to know if my hashCode implemention could be improved or if this is the best it's going to get.

public class Point 
{
    private final int x, y;

    public Point(int x, int y)
    {
        this.x = x;
        this.y = y;
    }

    public int getX() 
    {
        return x;
    }

    public int getY()
    {
        return y;
    }

    @Override
    public boolean equals(Object other) 
    {
        if (this == other)
          return true;

        if (!(other instanceof Point))
          return false;

        Point otherPoint = (Point) other;
        return otherPoint.x == x && otherPoint.y == y;
    }


    @Override
    public int hashCode()
    {
        return (Integer.toString(x) + "," + Integer.toString(y)).hashCode();
    }

}
Gabelle answered 3/2, 2012 at 21:30 Comment(3)
How are you trying to improve it? Do you want to try to make it faster?Inattention
you want to guarantee uniqueness? speed?Cytaster
I'd like to guarantee both :)Gabelle
A
13

Please do not use Strings. There's a lot of theory behind this and several implementations (division method, multiplication one, etc...). If you have about a hour you can watch this MIT-Class

This being said, here is what Netbeans 7.1 suggests:

@Override
public int hashCode() {
    int hash = 7;
    hash = 71 * hash + this.x;
    hash = 71 * hash + this.y;
    return hash;
}

October 2015 Edit

I started using IntelliJ a while back, I live happier now. This is what its automatic hashCode generation produces. It's a little less verbose. Note the use of prime numbers as well.

@Override
public int hashCode() {
    int result = x;
    result = 31 * result + y;
    return result;
}
Applause answered 3/2, 2012 at 21:48 Comment(8)
+1 Interesting that Netbeans suggests something different than eclipse, but the basis for the implementation is similar and solid.Wept
I'm confused, how is the second implementation a good implementation? You get collisions even with very small numbers, like (0,31),(1,0). That seems very detrimental, no?Comforter
@ChristopherShroba yours is a very interesting comment and I'll look into that once I'm back from vacation! The main issue is that result is initialized with 0 given your sample input. Still, that's what IntelliJ 2016 does...Applause
Right, and there'd be a collision for any points of the form (a, 31b),(b,31a). Thanks for the response! I'd love to hear your take on this when you get the chance!Comforter
@ChristopherShroba My assumption that the problem was due to result being initialized with 0 is also wrong... However, the Netbeans implementation is also not immune to the (a, 71b), (b,71a) attack... Thoughts?Applause
Very true. I guess it comes down to the pigeonhole principle - if the algorithm is using such small numbers, there are going to be collisions with small numbers. Personally, I think the algorithm is sound but the constants are too low. The chances of a hashset containing both (0,31) and (1,0) are higher than containing (0,4095) and (1,0) for example. One alternate implementation I can think of would use half the bits for x and half for y: return (x%(1<<16))<<16+y%(1<<16). An even better implementation would mod by the largest prime less than 2^16 instead of 2^16 itself. What do you think?Comforter
I might need some time to think about it. Have you looked at Michael's answer below about the Point Java class implementation? Thought: The simple fact that we cannot spot a pattern does not mean that such a pattern does not exist...Applause
@ChristopherShroba As stated in Java documentation the general contract for the hashcode is 1. If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. 2. It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.Peppermint
R
5

The manual multiplication of values of all significant member fields as suggested by Gevorg is probably the most efficient and has a good value distribution. However, if you favour readability, there are nice alternatives available either in Java 7...

import java.util.Objects;

...

@Override
public int hashCode() {
    return Objects.hash(x, y);
}

... or in the Guava library:

import com.google.common.base.Objects;

....

@Override
public int hashCode() {
    return Objects.hashCode(x, y);
}

Both of these varags methods simply delegate to Arrays.hashCode(Object[] a), so there is a slight impact on performance because of the autoboxing of ints and creating an array of object references, but it should be far less significant than using reflection.

And the readability is just great, since you simply see, which fields are used for the hashcode computation and all the multiply and add syntax is just hidden under the hood of Arrays.hashCode(Object[] a):

public static int hashCode(Object a[]) {
    if (a == null)
        return 0;

    int result = 1;

    for (Object element : a)
        result = 31 * result + (element == null ? 0 : element.hashCode());

    return result;
}
Rhodes answered 3/2, 2012 at 22:55 Comment(1)
Still vulnerable to any pairs in the form (x, 31y), (y, 31x) e.g. (0, 31), (1, 0) or (3, 217), (7, 93). I'm trying to spark a question wide discussion here. Is there a way to have a more robust implementation or with just 2 integers with have to deal with this kind of issue (that depends on the prime number used in the hashcode generation)?Applause
E
3

I would recommend using a simpler and more performant method without strings, perhaps Josh Bloch's method from this answer, in your case just:

return 37 * x + y;

EDIT: nybbler is correct. What is actually recommended is:

int result = 373; // Constant can vary, but should be prime
result = 37 * result + x;
result = 37 * result + y;
Encage answered 3/2, 2012 at 21:34 Comment(2)
That's not quite what's recommended in your linked answer. You missed that this should be done for every field individually, not at the same time. This algorithm generates the same result for [0,37] and [1,0]Wept
Note that the new implementation will still generate a collision for any pair of Points in the form (x, 37y), (y, 37x)...Applause
M
1

A really nice way to hash a 2D point into a single integer is to use a number spiral!

http://ulamspiral.com/images/IntegerSpiral.gif

@Override
public int hashCode() {
    int ax = Math.abs(x);
    int ay = Math.abs(y);
    if (ax>ay && x>0) return 4*x*x-3*x+y+1;
    if (ax>ay && x<=0) return 4*x*x-x-y+1;
    if (ax<=ay && y>0) return 4*y*y-y-x+1;
    return 4*y*y-3*y+x+1;
}

While this method requires a few more calculations, there will be no unpredictable collisions. It also has the nice property that points closer to the origin in general will have smaller hash value. (Still can overflow with x or y > sqrt(MAX_VALUE) however)

Mountebank answered 8/5, 2018 at 7:39 Comment(0)
G
0

I used to write my own hash and equals functions then I found this : )

import org.apache.commons.lang.builder.HashCodeBuilder;
import org.apache.commons.lang.builder.EqualsBuilder;

@Override
public boolean equals(Object obj) {
   return EqualsBuilder.reflectionEquals(this, obj);
 }
@Override
public int hashCode() {
   return HashCodeBuilder.reflectionHashCode(this);
 }

of course keep in mind the following:

Because reflection involves types that are dynamically resolved, certain Java virtual machine optimizations can not be performed. Consequently, reflective operations have slower performance than their non-reflective counterparts, and should be avoided in sections of code which are called frequently in performance-sensitive applications. SRC

Gerkman answered 3/2, 2012 at 21:34 Comment(3)
Since there are only two fields, you could also use this library, but explicitly list the fields.Encage
If this class is used significantly in a Collection, a reflective hashCode will be a big performance hit.Romeoromeon
I'd recommend using the HashCodeBuilder.append method instead.Encage
C
0

From the JDK's Point class (inherited from Point2d):

public int hashCode() {
    long bits = java.lang.Double.doubleToLongBits(getX());
    bits ^= java.lang.Double.doubleToLongBits(getY()) * 31;
    return (((int) bits) ^ ((int) (bits >> 32)));
}

That looks slightly better than your implementation.

Covariance answered 3/2, 2012 at 21:36 Comment(0)
B
0

You can have a look into existing Point type classes implementations:

/**
343      * Returns the hashcode for this <code>Point2D</code>.
344      * @return a hash code for this <code>Point2D</code>.
345      */
346     public int hashCode() {
347     long bits = java.lang.Double.doubleToLongBits(getX());
348     bits ^= java.lang.Double.doubleToLongBits(getY()) * 31;
349     return (((int) bits) ^ ((int) (bits >> 32)));
350     }

from: http://kickjava.com/src/java/awt/geom/Point2D.java.htm#ixzz1lMCZCCZw

Simple guide for hashCode implementation can be found here

Bolin answered 3/2, 2012 at 21:40 Comment(1)
Word of warning to all those using this for identification. Hash collisions are a reality... E.g. pastebin.com/6wM3W3WvClisthenes
W
0

By default, Eclipse will use a hashCode() function for your Point class similar to:

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + getOuterType().hashCode();
    result = prime * result + x;
    result = prime * result + y;
    return result;
}

At the very least, incorporating a prime number into your hashCode algorithm will help with it's "uniqueness".

Wept answered 3/2, 2012 at 21:41 Comment(2)
Perhaps you miswrote "Java" instead of "Eclipse". By default, hashCode "is typically implemented by converting the internal address of the object into an integer."Encage
@MatthewFlaschen Indeed I did. Updated now; thanks for catching that.Wept

© 2022 - 2024 — McMap. All rights reserved.