Understanding the workings of equals and hashCode in a HashMap
Asked Answered
G

9

63

I have this test code:

import java.util.*;

class MapEQ {

  public static void main(String[] args) {
   Map<ToDos, String> m = new HashMap<ToDos, String>();
   ToDos t1 = new ToDos("Monday");
   ToDos t2 = new ToDos("Monday");
   ToDos t3 = new ToDos("Tuesday");
   m.put(t1, "doLaundry");
   m.put(t2, "payBills");
   m.put(t3, "cleanAttic");
   System.out.println(m.size());
} }

class ToDos{

  String day;

  ToDos(String d) { day = d; }

  public boolean equals(Object o) {
      return ((ToDos)o).day == this.day;
 }

// public int hashCode() { return 9; }
}

When // public int hashCode() { return 9; } is uncommented m.size() returns 2, when it's left commented it returns three. Why?

Glassworks answered 12/12, 2009 at 19:0 Comment(0)
C
120

HashMap uses hashCode(), == and equals() for entry lookup. The lookup sequence for a given key k is as follows:

  • Use k.hashCode() to determine which bucket the entry is stored, if any
  • If found, for each entry's key k1 in that bucket, if k == k1 || k.equals(k1), then return k1's entry
  • Any other outcomes, no corresponding entry

To demonstrate using an example, assume that we want to create a HashMap where keys are something which is 'logically equivalent' if they have same integer value, represented by AmbiguousInteger class. We then construct a HashMap, put in one entry, then attempt to override its value and retrieve value by key.

class AmbiguousInteger {
    private final int value;

    AmbiguousInteger(int value) {
        this.value = value;
    }
}

HashMap<AmbiguousInteger, Integer> map = new HashMap<>();
// logically equivalent keys
AmbiguousInteger key1 = new AmbiguousInteger(1),
                 key2 = new AmbiguousInteger(1),
                 key3 = new AmbiguousInteger(1);
map.put(key1, 1); // put in value for entry '1'
map.put(key2, 2); // attempt to override value for entry '1'
System.out.println(map.get(key1));
System.out.println(map.get(key2));
System.out.println(map.get(key3));

Expected: 2, 2, 2

Don't override hashCode() and equals(): by default Java generates different hashCode() values for different objects, so HashMap uses these values to map key1 and key2 into different buckets. key3 has no corresponding bucket so it has no value.

class AmbiguousInteger {
    private final int value;

    AmbiguousInteger(int value) {
        this.value = value;
    }
}

map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 2, set as entry 2[1]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 2, get as entry 2[1]
map.get(key3); // map to no bucket
Expected: 2, 2, 2
Output:   1, 2, null

Override hashCode() only: HashMap maps key1 and key2 into the same bucket, but they remain different entries due to both key1 == key2 and key1.equals(key2) checks fail, as by default equals() uses == check, and they refer to different instances. key3 fails both == and equals() checks against key1 and key2 and thus has no corresponding value.

class AmbiguousInteger {
    private final int value;

    AmbiguousInteger(int value) {
        this.value = value;
    }

    @Override
    public int hashCode() {
        return value;
    }
}

map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[2]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[2]
map.get(key3); // map to bucket 1, no corresponding entry
Expected: 2, 2, 2
Output:   1, 2, null

Override equals() only: HashMap maps all keys into different buckets because of default different hashCode(). == or equals() check is irrelevant here as HashMap never reaches the point where it needs to use them.

class AmbiguousInteger {
    private final int value;

    AmbiguousInteger(int value) {
        this.value = value;
    }

    @Override
    public boolean equals(Object obj) {
        return obj instanceof AmbiguousInteger && value == ((AmbiguousInteger) obj).value;
    }
}

map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 2, set as entry 2[1]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 2, get as entry 2[1]
map.get(key3); // map to no bucket
Expected: 2, 2, 2
Actual:   1, 2, null

Override both hashCode() and equals(): HashMap maps key1, key2 and key3 into the same bucket. == checks fail when comparing different instances, but equals() checks pass as they all have the same value, and deemed 'logically equivalent' by our logic.

class AmbiguousInteger {
    private final int value;

    AmbiguousInteger(int value) {
        this.value = value;
    }

    @Override
    public int hashCode() {
        return value;
    }

    @Override
    public boolean equals(Object obj) {
        return obj instanceof AmbiguousInteger && value == ((AmbiguousInteger) obj).value;
    }
}

map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[1], override value
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[1]
map.get(key3); // map to bucket 1, get as entry 1[1]
Expected: 2, 2, 2
Actual:   2, 2, 2

What if hashCode() is random?: HashMap will assign a different bucket for each operation, and thus you never find the same entry that you put in earlier.

class AmbiguousInteger {
    private static int staticInt;
    private final int value;

    AmbiguousInteger(int value) {
        this.value = value;
    }

    @Override
    public int hashCode() {
        return ++staticInt; // every subsequent call gets different value
    }

    @Override
    public boolean equals(Object obj) {
        return obj instanceof AmbiguousInteger && value == ((AmbiguousInteger) obj).value;
    }
}

map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 2, set as entry 2[1]
map.get(key1); // map to no bucket, no corresponding value
map.get(key2); // map to no bucket, no corresponding value
map.get(key3); // map to no bucket, no corresponding value
Expected: 2, 2, 2
Actual:   null, null, null

What if hashCode() is always the same?: HashMap maps all keys into one big bucket. In this case, your code is functionally correct, but the use of HashMap is practically redundant, as any retrieval would need to iterate through all entries in that single bucket in O(N) time (or O(logN) for Java 8), equivalent to the use of a List.

class AmbiguousInteger {
    private final int value;

    AmbiguousInteger(int value) {
        this.value = value;
    }

    @Override
    public int hashCode() {
        return 0;
    }

    @Override
    public boolean equals(Object obj) {
        return obj instanceof AmbiguousInteger && value == ((AmbiguousInteger) obj).value;
    }
}

map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[1]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[1]
map.get(key3); // map to bucket 1, get as entry 1[1]
Expected: 2, 2, 2
Actual:   2, 2, 2

And what if equals is always false?: == check passes when we compare the same instance with itself, but fails otherwise, equals checks always fails so key1, key2 and key3 are deemed to be 'logically different', and maps to different entries, though they are still in the same bucket due to same hashCode().

class AmbiguousInteger {
    private final int value;

    AmbiguousInteger(int value) {
        this.value = value;
    }

    @Override
    public int hashCode() {
        return 0;
    }

    @Override
    public boolean equals(Object obj) {
        return false;
    }
}

map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[2]
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[2]
map.get(key3); // map to bucket 1, no corresponding entry
Expected: 2, 2, 2
Actual:   1, 2, null

Okay what if equals is always true now?: you're basically saying that all objects are deemed 'logically equivalent' to another, so they all map to the same bucket (due to same hashCode()), same entry.

class AmbiguousInteger {
    private final int value;

    AmbiguousInteger(int value) {
        this.value = value;
    }

    @Override
    public int hashCode() {
        return 0;
    }

    @Override
    public boolean equals(Object obj) {
        return true;
    }
}

map.put(key1, 1); // map to bucket 1, set as entry 1[1]
map.put(key2, 2); // map to bucket 1, set as entry 1[1], override value
map.put(new AmbiguousInteger(100), 100); // map to bucket 1, set as entry1[1], override value
map.get(key1); // map to bucket 1, get as entry 1[1]
map.get(key2); // map to bucket 1, get as entry 1[1]
map.get(key3); // map to bucket 1, get as entry 1[1]
Expected: 2, 2, 2
Actual:   100, 100, 100
Comedienne answered 25/7, 2016 at 4:22 Comment(3)
Neve came across such a clear explanation . Finally able to understand the difference. CheersTelophase
One of the best explanations I've seen in the whole community. This is like #explainlikeimfiveLifetime
I don't agree with some parts of this answer. 1. Different hash code != different bucket. 2. map.get(key3); // map to no bucket is absolutely wrong, the reason why map.get(key3) return null is that either the bucket is null or there isn't any node in that bucket that equals key3.Merna
R
49

You have overidden equals without overriding hashCode. You must ensure that for all cases where equals returns true for two objects, hashCode returns the same value. The hash code is a code that must be equal if two objects are equal (the converse need not be true). When you put your hard-coded value of 9 in, you satisfy the contract again.

In your hash map, equality is only tested within a hash bucket. Your two Monday objects should be equal, but because they are returning different hash codes, the equals method isn't even called to determine their equality - they are put straight into different buckets, and the possibility that they are equal isn't even considered.

Reunion answered 12/12, 2009 at 19:4 Comment(6)
It is also worth noting that the equals function is implemented incorrectly. It only works on this case because the two "Monday" string literals are interned and therefore have the same reference.Chloromycetin
The equals method should also be able to cope with null and arguments of different classes (note subtypes). If you keep equals (and hashCode) to a standard form, then it is trivial to implement correctly (even a machine could code it!).Sarsen
Even if two entries end up in the same bucket they aren't necessarily compared. The (Sun) implementation keeps a record of the hash code so that it doesn't need to be recomputed. This is also compared as a quick check before calling equals.Sarsen
Does size() express only the number of keys used?Glassworks
With the objects returning the same hash code, and equality defined according to the string entry, the two "Monday" objects are recognised as the same, and the second is not added - it is already in the map. size() returns the number of objects contained, and so in this case returns 2. Hope that makes sense.Reunion
It's worth to take a look at this answer too #2266003Clintclintock
S
8

I cannot emphasize enough that you should read Chapter 3 in Effective Java (warning: pdf link). In that chapter you will learn everything you need to know about overriding methods in Object, and in particular, about the equals contract. Josh Bloch has a great recipe for overriding the equals method that you should follow. And it will help you understand why you should be using equals and not == in your particular implementation of the equals method.

Hope this helps. PLEASE READ IT. (At least the first couple items... and then you will want to read the rest :-).

-Tom

Starbuck answered 12/12, 2009 at 20:8 Comment(2)
Not to mention, a recipe for overriding hashCode() too!Starbuck
new link uet.vnu.edu.vn/~chauttm/e-books/java/… It was quite a long chapter but well written and I totally got it, worth a readUnclinch
S
7

When you don't override the hashCode() method, your ToDos class inherits the default hashCode() method from Object, which gives every object a distinct hash code. This means that t1 and t2 have two different hash codes, even though were you to compare them, they would be equal. Depending on the particular hashmap implementation, the map is free to store them separately (and this is in fact what happens).

When you do correctly override the hashCode() method to ensure that equal objects get equal hash codes, the hashmap is able to find the two equal objects and place them in the same hash bucket.

A better implementation would give objects that are not equal different hash codes, like this:

public int hashCode() {
    return (day != null) ? day.hashCode() : 0;
}
Slype answered 12/12, 2009 at 19:6 Comment(12)
Why do you say it's "better" to give objects that aren't equal different hash codes?Reunion
The hash code is used by the HashMap to quickly find objects without having to compare every object. If all your hashcodes are the same, it will cause everything to go into the same bucket which will slow things down a lot.Chloromycetin
Because that reduces the number of hash collisions, which is vital for good performance. In the extreme case of hashCode() returning the same value for all objects, the HashMap degenerates to a list, and all those nice O(1) apperations are suddenly O(n).Bent
@Michael - But the same is true at the opposite extreme - your list of items in a single has bucket becomes a list of one item hash buckets. You need to get a balance, and there is certainly no rule of thumb saying that the right balance is obtained by making sure that inequality gives different hash codes.Reunion
@David - there is exactly that rule of thumb, by design of a hash table. From the hashCode() docs (java.sun.com/javase/6/docs/api/java/lang/…): the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.Slype
@Slype - the word in that is "may" not "will".Reunion
@David M, @Avi: It's a well known fact that you want your hash functions to have a nice uniform distribution. The reason for the "may" is because distinct hashCodes may map to the same value when you are using modulo arithmetic. For example. Hashcodes of 1 and 101 in a map with 100 buckets will map to the same bucket because 1 and 101 are congruent modulo 100. If however, your key space for hash values is the same as the number of buckets you have... then producing different values for different objects is in fact optimal.Starbuck
In summary, the "may" just comes from the fact that you can always have a set of really bad inputs that have distinct hashcodes and still lead to the worst case scenario of all mapping to the same bucket.Starbuck
does the size() method express the number of keys? or the number of keys-value pairs?Glassworks
@dmindreader: If you mean the size() method of the HashMap, it inherits its contract from java.util.Map, which means it expresses the number of items currently in the map.Slype
what if I have two distinct keys to map three values? what would size() print then?Glassworks
@dmindreader: I'm not sure I understand your question... in a map, the number of distinct keys in the map is always the same as the number of key-value pairs. The only time those numbers will differ is if you are using a MultiMap of some sort (which allows the same key to exist in the map). How can two distinct keys map to three values? Can you give an example?Starbuck
D
4

when you comment, it returns 3;

because hashCode() inherited from the Object is ONLY called which returns 3 different hashcodes for the 3 ToDos objects. The unequal hashcodes means the 3 objects are destined to different buckets and equals() return false as they are the first entrant in their respective buckets. If the hashCodes are different it is understood in advance that the objects are unequal. They will go in different buckets.

when you uncomment, it returns 2;

because here the overridden hashCode() is called which returns the same value for all the ToDos and they all will have to go into one bucket, connected linearly. Equal hashcodes dont promise anything about the equality or inequality of objects.

hashCode() for t3 is 9 and as it is the first entrant, equals() is false and t3 inserted in the bucket- say bucket0.

Then t2 getting the same hashCode() as 9 is destined for the same bucket0, a subsequent equals() on the already residing t3 in bucket0 returns false by the definition of overridden equal().

Now t1 with hashCode() as 9 is also destined for bucket0, and a subsequent equals() call returns true when compared with the pre-existing t2 in the same bucket. t1 fails to enter the map. So the net size of map is 2 -> {ToDos@9=cleanAttic, ToDos@9=payBills}

This explains the importance of implementing both equals() and hashCode(), and in such a way that the fields taken up in determining equals() must also be taken when determining hashCode(). This will guarantee that if two objects are equal they will always have same hashCodes. hashCodes should not be perceived as pseudo-random numbers as they must be consistent with equals()

Denbighshire answered 24/9, 2011 at 6:57 Comment(0)
K
4

According to Effective Java,

Always override hashCode() when you override equals()

well, why? Simple, because different objects (content, not references) should get different hash codes; on the other hand, equal objects should get the same hash code.

According to above, Java associative data structures compare the results obtained by equals() and hashCode() invokations to create the buckets. If both are the same, objects are equals; otherwise not.

In the specific case (i.e. the one presented above), when hashCode() is commented, a random number is generated for each instance (behaviour inherited by Object) as hash, the equals() checks String's references (remember Java String Pool), so the equals() should return true but the hashCode() not, the result is 3 different objects stored. Let's see what happens in case the hashCode() respecting the contract but returning always 9 is uncommented. Well, hashCode() is constantly the same, the equals() returns true for the two Strings in the Pool (i.e. "Monday"), and for them the bucket will be the same resulting in only 2 elements stored.

Therefore, it's definitely needed to be careful in using the hashCode() and equals() overriding, in particular when compound data types are user defined and they are used with Java associative data structures.

Kooima answered 2/8, 2014 at 16:35 Comment(0)
A
0

When hashCode is uncommented, HashMap sees t1 and t2 as being the same thing; thus, t2's value clobbers that of t1. To understand how this works, note that when hashCode returns the same thing for two instances, they end up going to the same HashMap bucket. When you try to insert a second thing into the same bucket (in this case t2 is being inserted when t1 is already present), HashMap scans the bucket for another key that is equals. In your case, t1 and t2 are equals because they have the same day. At that point, "payBills" clobbers "doLaundry". As for whether t2 clobbers t1 as the key, I believe this is undefined; thus, either behavior is allowed.

There are a few important things to think about here:

  1. Are two ToDos instances really equal just because they have the same day of the week?
  2. Whenever you implement equals, you should implement hashCode so that any two objects that are equals also have the same hashCode values. This is a fundamental assumption that HashMap makes. This is probably also true of anything else that relies the hashCode method.
  3. Design your hashCode method so that the hash codes are evenly distributed; otherwise, you won't get the performance benefits of hashing. From this perspective, returning 9 is one of the worst things you can do.
Aubry answered 12/12, 2009 at 19:36 Comment(0)
R
0

Rather than thinking of hashCode in terms of hash-bucket mapping, I think it's more helpful to think somewhat more abstractly: an observation that two objects have different hash codes constitutes an observation that the objects are not equal. As a consequence of that, an observation that none of the objects in a collection have a particular hash code constitutes an observation that none of the objects in a collection are equal to any object which has that hash code. Further, an observation that none of the objects in a collection have a hash code with some trait constitutes an observation that none of them are equal to any object which does.

Hash tables generally work by defining a family of traits, exactly one of of which will be applicable to each object's hash code (e.g. "being congruent to 0 mod 47", "being congruent to 1 mod 47", etc.), and then having a collection of objects with each trait. If one is then given an object and can determine which trait applies to it, one can know that it must be in a collection of things with that trait.

That hash tables generally use a sequence of numbered buckets is an implementation detail; what is essential is that an object's hash code is quickly used to identify many things which it cannot possibly be equal to, and with which it thus will not have to be compared.

Raji answered 17/10, 2013 at 15:17 Comment(0)
W
0

Whenever you create a new object in Java, it will be assigned a unique hashcode by JVM itself. If you wouldn't override hashcode method then object will get unique hascode and hence a unique bucket (Imagine bucket is nothing but a place in memory where JVM will go to find an object).

(you can check uniqueness of an hashcode by calling hashcode method on each object and printing their values on console)

In your case when you are un commentting hashcode method, hashmap firstly look for bucket having same hashcode that method returns. And everytime you are returning same hashcode. Now when hashmap finds that bucket, it will compare current object with the object residing into bucket using euqals method. Here it finds "Monday" and so hashmap implementation do not allow to add it again because there is already an object having same hashcode and same euqality implementation.

When you comment hashcode method, JVM simply returns different hashcode for all the three objects and hence it never even bother about comapring objects using equals method. And so there will be three different objects in Map added by hashmap implementation.

Webbing answered 5/6, 2015 at 18:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.