I tried to write a small program to demonstrate hash collisions in java when only equals is overridden and not hashcode() method. This was to prove the theory that two unequal objects can have the same hashcode. This was for an Interview question where the behaviour was asked.
I created 200,000 objects, stored them in an array and then compared them to see which are duplicates. (For this I am using a nested for loop iterating over an array of objects after the object creation phase.) For around 200,000 objects I get 9 collisions. First one being object at index 196 and 121949. I then go on to print these hash codes to show that both values are the same.
However I am getting some very surprising behavior. If I iterate over the nested for loop and print the first collision of hashcodes, I get same hashcode value
1867750575
1867750575
for both object at index 196 and 121949.
But if I comment out the nested for loop for detecting all the collisions and directly print hashcode for elements at index 196 and 121949 I get
1829164700
366712642
Note, I am not commenting out the creation of these elements, just the part where I check for collisions.
Why is this happening, even if I dont iterate over them, shouldn't the hashcode be consistent ?
Addendum 1: Is there a source behind this, as far as I know, going by the Birthday principle, if I create 200,000 objects, I must get a collision, how is iterating over each hascode or not changing anything?
Addendum 2: I tried adding another array of 200000 size just to see if the colliding indexes change, they did not, so apparently making changes to the binary with the loop uncommitted does not make any changes. So the hypothesis that changing binary changes hashcode doesn't hold.
Here is my code
import java.util.HashMap;
public class EmployeeFactory {
private static int counter = 0;
public int id;
public String empName;
EmployeeFactory() {
id = counter;
empName = "employee_" + id;
counter++;
}
@Override
public boolean equals(Object o) {
// If the object is compared with itself then return true
if (o == this) {
return true;
}
if (o == null || o.getClass() != this.getClass()) {
return false;
}
EmployeeFactory emp = (EmployeeFactory) o;
// Compare the data members and return accordingly
return this.id == emp.id;
}
public static void main(String[] args) {
int Obj_Count = 200000;
EmployeeFactory objs[] = new EmployeeFactory[Obj_Count];
for (int i = 0; i < Obj_Count; ++i) {
EmployeeFactory temp = new EmployeeFactory();
objs[i] = temp;
}
//Please try code once un commenting the loop below and once while keeping it commented.
/*
for (int i = 0; i < Obj_Count; ++i)
{
for (int j = i + 1; j < Obj_Count; ++j)
{
if (objs[i].hashCode() == objs[j].hashCode())
{
System.out.println("Objects with IDs " + objs[i].id
+ " and " + objs[j].id + " collided.");
System.out.println("Object Is " + i + "and Obj ID is "+ objs[i].id + " Has Hashcode " + objs[i].hashCode());
System.out.println("Object Is " + j + "and Obj ID is "+ objs[j].id + " Has Hashcode " + objs[j].hashCode());
System.out.println("");
}
}
}
*/
HashMap<EmployeeFactory, EmployeeFactory> hm = new HashMap<EmployeeFactory, EmployeeFactory>();
objs[121949].id = objs[196].id;
hm.put(objs[196], objs[196]);
hm.put(objs[121949], objs[121949]);
System.out.println(hm.get(objs[121949]).empName);
System.out.println(hm.get(objs[196]).empName);
// checking the hashmap
System.out.println(hm.get(objs[121949]).hashCode());
System.out.println(hm.get(objs[196]).hashCode());
// Checking the array
System.out.println(objs[121949].hashCode());
System.out.println(objs[196].hashCode());
}
}
Comemented Output:
employee_121949
employee_196
1829164700
366712642
1829164700
366712642
Un-commented loop Output
Objects with IDs 196 and 121949 collided.
Object Is 196and Obj ID is 196 Has Hashcode 1867750575
Object Is 121949and Obj ID is 121949 Has Hashcode 1867750575
Objects with IDs 62082 and 145472 collided.
Object Is 62082and Obj ID is 62082 Has Hashcode 2038112324
Object Is 145472and Obj ID is 145472 Has Hashcode 2038112324
Objects with IDs 62354 and 105841 collided.
Object Is 62354and Obj ID is 62354 Has Hashcode 2134400190
Object Is 105841and Obj ID is 105841 Has Hashcode 2134400190
Objects with IDs 68579 and 186938 collided.
Object Is 68579and Obj ID is 68579 Has Hashcode 1872358815
Object Is 186938and Obj ID is 186938 Has Hashcode 1872358815
Objects with IDs 105219 and 111288 collided.
Object Is 105219and Obj ID is 105219 Has Hashcode 651156501
Object Is 111288and Obj ID is 111288 Has Hashcode 651156501
Objects with IDs 107634 and 152385 collided.
Object Is 107634and Obj ID is 107634 Has Hashcode 273791087
Object Is 152385and Obj ID is 152385 Has Hashcode 273791087
Objects with IDs 108007 and 146405 collided.
Object Is 108007and Obj ID is 108007 Has Hashcode 1164664992
Object Is 146405and Obj ID is 146405 Has Hashcode 1164664992
Objects with IDs 135275 and 180997 collided.
Object Is 135275and Obj ID is 135275 Has Hashcode 996371445
Object Is 180997and Obj ID is 180997 Has Hashcode 996371445
Objects with IDs 153749 and 184310 collided.
Object Is 153749and Obj ID is 153749 Has Hashcode 254720071
Object Is 184310and Obj ID is 184310 Has Hashcode 254720071
employee_121949
employee_121949
1867750575
1867750575
1867750575
1867750575
equals
and nothashCode
is not collisions. Hash codes are expected to collide. The problem is that you can have equal objects with different hash codes, which means that you will not find them in HashMaps, etc. – Amandy