Java overriding equals() and hashcode() for two interchangeable integers

Asked 8/4, 2013 at 11:30 Answered 5/4, 2022 at 2:10

Solved java equals hashcode hash-code-uniqueness

I'm overriding the equals and hashcode methods for a simple container object for two ints. Each int reflects the index of another object (it doesn't matter what that object is). The point of the class is to represent a connection between the two objects.

The direction of the connection doesn't matter, therefore the equals method should return true regardless of which way round the two ints are in the object E.g.

connectionA = new Connection(1,2);
connectionB = new Connection(1,3);
connectionC = new Connection(2,1);

connectionA.equals(connectionB); // returns false
connectionA.equals(connectionC); // returns true

Here is what I have (modified from the source code for Integer):

public class Connection {
    // Simple container for two numbers which are connected.
    // Two Connection objects are equal regardless of the order of from and to.

    int from;
    int to;

    public Connection(int from, int to) {
        this.from = from;
        this.to = to;
    }

    // Modifed from Integer source code
    @Override
    public boolean equals(Object obj) {
        if (obj instanceof Connection) {
            Connection connectionObj = (Connection) obj;
            return ((from == connectionObj.from && to == connectionObj.to) || (from == connectionObj.to && to == connectionObj.from));
        }
        return false;
    }

    @Override
    public int hashCode() {
        return from*to;
    }
}

This does work however my question is: Is there a better way to achieve this?

My main worry is with the hashcode() method will return the same hashcode for any two integers which multiply to equal the same number. E.g.

3*4 = 12
2*6 = 12 // same!

The documentation, http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Object.html#hashCode(), states that

It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

If anyone can see a simple way of reducing the number of matching hashcodes then I would be appreciative of an answer.

Thanks!

Tim

PS I'm aware that there is a java.sql.Connection which could cause some import annoyances. The object actually has a more specific name in my application but for brevity I shortened it to Connection here.

Mcquade answered 8/4, 2013 at 11:30 Comment(0)

This is widely accepted approach:

@Override
public int hashCode() {
    int res = 17;
    res = res * 31 + Math.min(from, to);
    res = res * 31 + Math.max(from, to);
    return res;
}

Beaird answered 8/4, 2013 at 11:32 Comment(7)

This will not work in this case as from and to are not necessarily equal. Instead, maybe sort them first. – Cyperaceous 8/4, 2013 at 11:33

I just tried this in Excel. Pescis is correct, it does not give the same res if you swap the numbers (unless they are the same). – Mcquade 8/4, 2013 at 11:36

@TwiceCircled, have you specified that requirement in the question? – Beaird 8/4, 2013 at 11:38

how about if you identify the smaller number first, then use that for the first line and the other one for the second line? – Schumacher 8/4, 2013 at 11:38

@Nikolay - Yes I believe so, see the second paragraph. – Mcquade 8/4, 2013 at 11:39

Yes he did! "The direction of the connection doesn't matter, therefore the equals method should return true regardless of which way round the two ints are in the object" – Hadst 8/4, 2013 at 11:40

I've chosen this as correct answer as it follows conventions I've seen/used before. I'm sure some of the other answers also work. – Mcquade 8/4, 2013 at 11:48

Three solutions that would "work" have been proposed. (By work, I mean that they satisfy the basic requirement of a hashcode ... that different inputs give different outputs ... and they also satisfy the OP's additional "symmetry" requirement.)

These are:

   # 1
   return from ^ to;

   # 2
   return to*to+from*from;

   # 3
   int res = 17;
   res = res * 31 + Math.min(from, to);
   res = res * 31 + Math.max(from, to);
   return res;

The first one has the problem that the range of the output is bounded by the range of the actual input values. So for instance if we assume that the inputs are both non-negative numbers less or equal to 2ⁱ and 2^j respectively, then the output will be less or equal to 2^max(i,j). That is likely to give you poor "dispersion"¹ in your hash table ... and a higher rate of collisions. (There is also a problem when from == to!)

The second and third ones are better than the first, but you are still liable to get more collisions than is desirable if from and to are small.

I would suggest a 4th alternative if it is critical that you minimize collisions for small values of from and to.

  #4
  int res = Math.max(from, to);
  res = (res << 16) | (res >>> 16);  // exchange top and bottom 16 bits.
  res = res ^ Math.min(from, to);
  return res;

This has the advantage that if from and to are both in the range 0..2¹⁶-1, you get a unique hashcode for each distinct (unordered) pair.

^{1 - I don't know if this is the correct technical term for this ...}

Hadst answered 8/4, 2013 at 12:13 Comment(1)

One slight problem with your #4 is that some hash tables may map hash codes to slot in such a way as to undo the work you did separating the values. I'd suggest computing bigprime1*(from+to)-bigprime2*min(from,to). No need to compute both max and min, since sum-max=min. XOR seems popular in hashing, but in a language with defined overflow semantics I don't know that it has any meaningful advantages over addition. – Bewitch 15/1, 2014 at 4:45

i think, something like

@Override
public int hashCode() {
    return to*to+from*from;
}

is good enough

Hydrogenous answered 8/4, 2013 at 11:38 Comment(2)

for from in (1..1000) and two in (1..from) i got 330159 collisions. Accepted answer has 497477 collisions. – Hydrogenous 8/4, 2013 at 11:56

(And my solution #4 should give ZERO collisions across that range :-) ) – Hadst 8/4, 2013 at 12:25

Typically I use XOR for hashcode method.

@Override
public int hashCode() {
    return from ^ to;
}

Althorn answered 8/4, 2013 at 11:40 Comment(6)

depending on the size of the numbers, and particularly of to, does this not quickly cause overflows? – Schumacher 8/4, 2013 at 11:41

@Schumacher - 1) No. 2) It wouldn't matter if it did ... this is a hash code not a meaningful computation. (The '^' operator is exclusive OR!!) – Hadst 8/4, 2013 at 11:43

@StephenC Fair point. there is however, still the issue that from^to != to^from which is a requirement. – Schumacher 8/4, 2013 at 11:45

@Schumacher - I think you may want to check that "fact". Bitwise XOR is a symmetric operation ... – Hadst 8/4, 2013 at 11:48

All objects where from == to will result in a hashCode of 0 with this solution, probably not desirable. – Broyles 8/4, 2013 at 11:52

@StephenC My bad. I misunderstood ^ – Schumacher 8/4, 2013 at 11:54

I wonder why nobody offered the usually best solution: Normalize your data:

 Connection(int from, int to) {
      this.from = Math.min(from, to);
      this.to = Math.max(from, to);
 }

If it's impossible, then I'd suggest something like

 27644437 * (from+to) + Math.min(from, to)

By a using a multiplier different from 31, you avoid collisions like in this question.
By using a big multiplier you spread the numbers better.
By using an odd multiplier you ensure that the multiplication is bijective (i.e., no information gets lost).
By using a prime you gain nothing at all, but everyone does it and it has no disadvantage.

Rexanna answered 15/6, 2014 at 14:32 Comment(8)

It may also be worth noting that your second indicated structure is bijective with respect to x for all pairs of the form (x, x) and is nearly bijective for pairs of the form (x,x+delta). By contrast, forms that multiply one value by 31 and add the other will always yield a result that's a multiple of 32 when used with two equal values, 64 when used with four, 128 when used with eight, 256 when used with 16, etc. – Bewitch 5/8, 2014 at 16:4

@Bewitch 27644437 * x + x = 0x1A5D216 * x which is always even, so it can't be bijective. But it's as good as it can as it's no multiple of 4. I've been always claiming that 31 is a terrible multiplier while I didn't think about hash(x, x) being a multiple of 32. – Rexanna 5/8, 2014 at 17:33

It ends up being 27644437*(x+x)+x, which is odd. As for the merits of 31, it wouldn't be so bad if each step computed 31*prev-x rather than 31*prev+x. The would of course make every (x,-x) value hash to a multiple of 32, but those are probably much less common that (x,x). What's really horrible is x^y. That maps every (x,x) to zero, and is almost as bad with (x,x+1), mapping half of those to 1, a quarter to 3, an eighth to 7, etc. What's absurd is that x+y is generally no more expensive than x^y, but people have some odd attachment to xor. BTW, from+to might be a decent hash. – Bewitch 5/8, 2014 at 17:45

@Bewitch I see; I was thinking about the 27644437 multiplier rather than about my above formula. Agreed, replacing addition by subtraction would be better; probably similar to replacing 31 by 33. Agreed again, using XOR is only good if you know what you're doing. Using it as in Map.Entry is hard to beat, but an even more stupid example is BitSet. The latter even ignores some bits completelly when i+1 is a power of two. – Rexanna 6/8, 2014 at 4:57

I think the Map.Entry is worse; since Set provides no way of retrieving the particular Value object associated with a key, it's often useful to create a map where every item's key and value are identical. A hash function which expends effort on every item but always ends up returning zero is probably the worst legitimate hash function possible. – Bewitch 6/8, 2014 at 15:9

@Bewitch Right, and sometimes you need its EntrySet and you're toasted. One solution would be wrapping the value into something multiplying the hashCode by 2 or 3 (I guess both x^(2*x) and x^(3*x) behave rather sane; whatever you do, the LSb of something will be constant). Agreed that BitSet#hashCode is mostly harmless as nobody really needs it. – Rexanna 6/8, 2014 at 15:18

I wouldn't say BitSet#hashCode is harmless because nobody needs it, but rather that would seem unlikely to result in massively degenerate behavior. A million-item hash table in which every single item collides with exactly one other should be considered well-hashed even though it has a "100% collision rate". One where a third of the items all yield the same hash value would be poorly hashed, even if all other items had distinct hash values (and it thus had 666,667 distinct values as compared with the 500,000 distinct values in the first table). – Bewitch 6/8, 2014 at 15:37

I would suspect that in typical usage, something like BitSet@hashCode would have many more collisions than would a slightly-tweaked version of the algorithm, but I wouldn't expect collisions to form large clusters the way xor may do in many real-world situations, nor to yield completely degenerate behavior as would occur with a table that mapped each item to an equal item. – Bewitch 6/8, 2014 at 15:43

-1

Java 1.7+ have Objects.hash

@Override
public int hashCode() {
    return Objects.hash(from, to);
}

Drops answered 5/4, 2022 at 2:10 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags