Are mutable hashmap keys a dangerous practice?
Asked Answered
O

10

78

Is it bad practice to use mutable objects as Hashmap keys? What happens when you try to retrieve a value from a Hashmap using a key that has been modified enough to change its hashcode?

For example, given

class Key
{
    int a; //mutable field
    int b; //mutable field

    public int hashcode()
        return foo(a, b);
    // setters setA and setB omitted for brevity
}

with code

HashMap<Key, Value> map = new HashMap<Key, Value>();

Key key1 = new Key(0, 0);
map.put(key1, value1); // value1 is an instance of Value

key1.setA(5);
key1.setB(10);

What happens if we now call map.get(key1)? Is this safe or advisable? Or is the behavior dependent on the language?

Oogonium answered 20/10, 2011 at 20:48 Comment(1)
I would say, in general, it is inadvisable to use a mutable key. But "safe" is a different question. You can remain "safe" by updating the key-value pair (anytime a key changes). Furthermore, it's absolutely language dependent because behavior is determined by the the contract--it's not inconceivable (though unlikely) that a language would define a key to be a specific object or value, i.e. O1 equals O2, yet O1 points to a different value than O2 in a hash table (again, this behavior wouldn't make much sense).Logistic
B
100

It has been noted by many well respected developers such as Brian Goetz and Josh Bloch that :

If an object’s hashCode() value can change based on its state, then we must be careful when using such objects as keys in hash-based collections to ensure that we don’t allow their state to change when they are being used as hash keys. All hash-based collections assume that an object’s hash value does not change while it is in use as a key in the collection. If a key’s hash code were to change while it was in a collection, some unpredictable and confusing consequences could follow. This is usually not a problem in practice — it is not common practice to use a mutable object like a List as a key in a HashMap.

Buddha answered 29/10, 2011 at 21:39 Comment(8)
Can you post the source?Excelsior
Probable source: From the series "Java theory and practice" - Hashing it out by Brian Goetz, May 27, 2003Arlenarlena
This is also in the official Java API docs, e.g. for java.util.Map: "Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map. ".Unification
@Logistic what is not clear in this sentence: "All hash-based collections assume that an object’s hash value does not change while it is in use as a key in the collection." ? This is the explanation of why it is bad to use a mutable object as a key in an hash map. They calculate the hash code at insertion...There is no fallacy in here, what is written in that quoted paragraph extracted from the book is simply correct. You are the only one that downvoted the answer, so probably is you that have issues accepting the truth. Probably you just want to create some polemic without any real reason ...Buddha
@Buddha Also this isn't a complete explanation. It's not just the hash value that's the problem. That's literally only half of the problem. This doesn't address the fact that a hash table will store the key object for equality checking--perhaps that seems like a trivial problem but I (at least) have found it wasn't. In particular, this can become a problem when trying to optimize Java for memory consumption (when mutable objects are a necessity).Logistic
@Buddha so why can't they simply not allow it, like python which doesn't allow mutable objects as keysPaperboy
@Paperboy that is a question for Java Design Committee...Buddha
@Buddha yes, but there should be some justification, would you be aware of the same? For me, it looks like a very bad design choice and relies on users to be mature enough to understand that given Java's inherent philosophy is more secure, more explicit, static. If it was another way round, i.e. python allowing it I would have understood.Paperboy
C
32

This is not safe or advisable. The value mapped to by key1 can never be retrieved. When doing a retrieval, most hash maps will do something like

Object get(Object key) {
    int hash = key.hashCode();
    //simplified, ignores hash collisions,
    Entry entry = getEntry(hash);
    if(entry != null && entry.getKey().equals(key)) {
        return entry.getValue();
    }
    return null;
}

In this example, key1.hashcode() now points to the wrong bucket of the hash table, and you will not be able to retrieve value1 with key1.

If you had done something like,

Key key1 = new Key(0, 0);
map.put(key1, value1);
key1.setA(5);
Key key2 = new Key(0, 0);
map.get(key2);

This will also not retrieve value1, as key1 and key2 are no longer equal, so this check

    if(entry != null && entry.getKey().equals(key)) 

will fail.

Coletta answered 24/10, 2011 at 5:3 Comment(1)
I like this answer because of the clear, concrete explanation of what actually happens after the key is mutated. Also for giving another case where it would fail.Alienable
D
6

Hash maps use hash code and equality comparisons to identify a certain key-value pair with a given key. If the has map keeps the key as a reference to the mutable object, it would work in the cases where the same instance is used to retrieve the value. Consider however, the following case:

T keyOne = ...;
T keyTwo = ...;

// At this point keyOne and keyTwo are different instances and 
// keyOne.equals(keyTwo) is true.

HashMap myMap = new HashMap();

myMap.push(keyOne, "Hello");

String s1 = (String) myMap.get(keyOne); // s1 is "Hello"
String s2 = (String) myMap.get(keyTwo); // s2 is "Hello" 
                                        // because keyOne equals keyTwo

mutate(keyOne);

s1 = myMap.get(keyOne); // returns "Hello"
s2 = myMap.get(keyTwo); // not found

The above is true if the key is stored as a reference. In Java usually this is the case. In .NET for instance, if the key is a value type (always passed by value), the result will be different:

T keyOne = ...;
T keyTwo = ...;

// At this point keyOne and keyTwo are different instances 
// and keyOne.equals(keyTwo) is true.

Dictionary myMap = new Dictionary();

myMap.Add(keyOne, "Hello");

String s1 = (String) myMap[keyOne]; // s1 is "Hello"
String s2 = (String) myMap[keyTwo]; // s2 is "Hello"
                                    // because keyOne equals keyTwo

mutate(keyOne);

s1 = myMap[keyOne]; // not found
s2 = myMap[keyTwo]; // returns "Hello"

Other technologies might have other different behaviors. However, almost all of them would come to a situation where the result of using mutable keys is not deterministic, which is very very bad situation in an application - a hard to debug and even harder to understand.

Dalrymple answered 28/10, 2011 at 18:37 Comment(0)
M
6

If key’s hash code changes after the key-value pair (Entry) is stored in HashMap, the map will not be able to retrieve the Entry.

Key’s hashcode can change if the key object is mutable. Mutable keys in HahsMap can result in data loss.

Mo answered 14/9, 2014 at 9:51 Comment(0)
R
5

This will not work. You are changing the key value, so you are basically throwing it away. Its like creating a real life key and lock, and then changing the key and trying to put it back in the lock.

Resor answered 20/10, 2011 at 20:57 Comment(0)
T
3

As others explained, it is dangerous.

A way to avoid that is to have a const field giving explicitly the hash in your mutable objects (so you would hash on their "identity", not their "state"). You might even initialize that hash field more or less randomly.

Another trick would be to use the address, e.g. (intptr_t) reinterpret_cast<void*>(this) as a basis for hash.

In all cases, you have to give up hashing the changing state of the object.

Tereasaterebene answered 28/10, 2011 at 19:27 Comment(2)
Assuming this is Java, by code snippets, programmer can safely rely on native Object.hashcode(). It shall generate a int value based in create order, or other simple, non mutable and unique value.Udell
While you can technically do this, I don't really see why you would do it. It seems very convoluted (still a good answer though). Just don't use mutable keys, kids :-).Unification
L
2

There are two very different issues that can arise with a mutable key depending on your expectation of behavior.

First Problem: (probably most trivial--but hell it gave me problems that I didn't think about!)

You are attempting to place key-value pairs into a map by updating and modifying the same key object. You might do something like Map<Integer, String> and simply say:

int key = 0;
loop {
    map.put(key++, newString);
}

I'm reusing the "object" key to create a map. This works fine in Java because of autoboxing where each new value of key gets autoboxed to a new Integer object. What would not work is if I created my own (mutable) Integer object:

MyInteger {
   int value;

   plusOne(){
      value++;
   }
}

Then tried the same approach:

MyInteger key = new MyInteger(0);
loop{
   map.put(key.plusOne(), newString)
}

My expectation is that, for instance, I map 0 -> "a" and 1 -> "b". In the first example, if I change int key = 0, the map will (correctly) give me "a". For simplicity let's assume MyInteger just always returns the same hashCode() (if you can somehow manage to create unique hashCode values for all possible states of an object, this will not be an issue, and you deserve an award). In this case, I call 0 -> "a", so now the map holds my key and maps it to "a", I then modify key = 1 and try to put 1 -> "b". We have a problem! The hashCode() is the same, and the only key in the HashMap is my MyInteger key object which has just been modified to be equal to 1, so It overwrites that key's value so that now, instead of a map with 0 -> "a" and 1 -> "b", I have 1 -> "b" only! Even worse, if I change back to key = 0, the hashCode points to 1 -> "b", but since the HashMap's only key is my key object, it satisfied the equality check and returns "b", not "a" as expected.

If, like me, you fall prey to this type of issue, it's incredibly difficult to diagnose. Why? Because if you have a decent hashCode() function it will generate (mostly) unique values. The hash value will largely take care of the inequality problem when structuring the map but if you have enough values, eventually you'll get a collision on the hash value and then you get unexpected and largely inexplicable results. The resultant behavior is that it works for small runs but fails for larger ones.

Advice:

To find this type of issue, modify the hashCode() method, even trivially (i.e. = 0--obviously when doing this, keep in mind that the hash values should be the same for two equal objects*), and see if you get the same results--because you should and if you don't, there's likely a semantic error with your implementation that's using a hash table.

*There should be no danger (if there is--you have a semantic problem) in always returning 0 from a hashCode() (although it would defeat the purpose of a Hash Table). But that's sort of the point: the hashCode is a "quick and easy" equality measure that's not exact. So two very different objects could have the same hashCode() yet not be equal. On the other hand, two equal objects must always have the same hashCode() value.

p.s. In Java, from my understanding, if you do such a terrible thing (as have many hashCode() collisions), it will start using a red-black-tree as opposed to ArrayList. So when you expect O(1) lookup, you'll get O(log(n))--which is better than the ArrayList which would give O(n).

Second Problem:

This is the one that most others seem to be focusing on, so I'll try to be brief. In this use case, I try to map a key-value pair and then I do some work on the key and then want to come back and get my value.

Expectation: key -> value is mapped, I then modify key and try to get(key). I expect that will give me value.

It seems kind of obvious to me that this wouldn't work but I'm not above having tried to use things like Collections as a key before (and quite quickly realizing it doesn't work). It doesn't work because it's quite likely that the hash value of key has changed so you won't even be looking in the correct bucket.

This is why it's very inadvisable to use collections as keys. I would assume, if you were doing this, you're trying to establish a many-to-one relationship. So I have a class (as in teaching) and I want two groups to do two different projects. What I want is that given a group, what is their project? Simple, I divide the class in two, and I have group1 -> project1 and group2 -> project2. But wait! A new student arrives so I place them in group1. The problem is that group1 has now been modified and likely its hash value has changed, therefore trying to do get(group1) is likely to fail because it will look in a wrong or non-existent bucket of the HashMap.

The obvious solution to the above is to chain things--instead of using the groups as keys, give them labels (that don't change) that point to the group and therefore the project: g1 -> group1 and g1 -> project1, etc.

p.s.

Please make sure to define a hashCode() and equals(...) method for any object you expect to use as a key (eclipse and, I'm assuming, most IDE's can do this for you).

Code Example:

Here is a class which exhibits the two different "problem" behaviors. In this case, I attempt to map 0 -> "a", 1 -> "b", and 2 -> "c" (in each case). In the first problem, I do that by modifying the same object, in the second problem, I use unique objects, and in the second problem "fixed" I clone those unique objects. After that I take one of the "unique" keys (k0) and modify it to attempt to access the map. I expect this will give me a, b, c and null when the key is 3.

However, what happens is the following:

map.get(0) map1: 0 -> null, map2: 0 -> a, map3: 0 -> a
map.get(1) map1: 1 -> null, map2: 1 -> b, map3: 1 -> b
map.get(2) map1: 2 -> c, map2: 2 -> a, map3: 2 -> c
map.get(3) map1: 3 -> null, map2: 3 -> null, map3: 3 -> null

The first map ("first problem") fails because it only holds a single key, which was last updated and placed to equal 2, hence why it correctly returns "c" when k0 = 2 but returns null for the other two (the single key doesn't equal 0 or 1). The second map fails twice: the most obvious is that it returns "b" when I asked for k0 (because it's been modified--that's the "second problem" which seems kind of obvious when you do something like this). It fails a second time when it returns "a" after modifying k0 = 2 (which I would expect to be "c"). This is more due to the "first problem": there's a hash code collision and the tiebreaker is an equality check--but the map holds k0, which it (apparently for me--could theoretically be different for someone else) checked first and thus returned the first value, "a" even though had it kept checking, "c" would have also been a match. Finally, the 3rd map works perfectly because I'm enforcing that the map holds unique keys no matter what else I do (by cloning the object during insertion).

I want to make clear that I agree, cloning is not a solution! I simply added that as an example of why a map needs unique keys and how enforcing unique keys "fixes" the issue.

public class HashMapProblems {

   private int value = 0;

   public HashMapProblems() {
       this(0);
   }

   public HashMapProblems(final int value) {
       super();
       this.value = value;
   }

   public void setValue(final int i) {
       this.value = i;
   }

   @Override
   public int hashCode() {
       return value % 2;
   }

   @Override
   public boolean equals(final Object o) {
       return o instanceof HashMapProblems
            && value == ((HashMapProblems) o).value;
   }

   @Override
   public Object clone() {
       return new HashMapProblems(value);
   }

   public void reset() {
       this.value = 0;
   }

   public static void main(String[] args) {
       final HashMapProblems k0 = new HashMapProblems(0);
       final HashMapProblems k1 = new HashMapProblems(1);
       final HashMapProblems k2 = new HashMapProblems(2);
       final HashMapProblems k = new HashMapProblems();
       final HashMap<HashMapProblems, String> map1 = firstProblem(k);
       final HashMap<HashMapProblems, String> map2 = secondProblem(k0, k1, k2);
       final HashMap<HashMapProblems, String> map3 = secondProblemFixed(k0, k1, k2);

       for (int i = 0; i < 4; ++i) {
           k0.setValue(i);
           System.out.printf(
                "map.get(%d) map1: %d -> %s, map2: %d -> %s, map3: %d -> %s",
                i, i, map1.get(k0), i, map2.get(k0), i, map3.get(k0));
           System.out.println();
       }
   }

   private static HashMap<HashMapProblems, String> firstProblem(
        final HashMapProblems start) {
       start.reset();
       final HashMap<HashMapProblems, String> map = new HashMap<>();

       map.put(start, "a");
       start.setValue(1);
       map.put(start, "b");
       start.setValue(2);
       map.put(start, "c");
       return map;
   }

   private static HashMap<HashMapProblems, String> secondProblem(
        final HashMapProblems... keys) {
       final HashMap<HashMapProblems, String> map = new HashMap<>();

       IntStream.range(0, keys.length).forEach(
            index -> map.put(keys[index], "" + (char) ('a' + index)));
       return map;
   }

   private static HashMap<HashMapProblems, String> secondProblemFixed(
        final HashMapProblems... keys) {
       final HashMap<HashMapProblems, String> map = new HashMap<>();

       IntStream.range(0, keys.length)
            .forEach(index -> map.put((HashMapProblems) keys[index].clone(),
                    "" + (char) ('a' + index)));
       return map;
   }
}

Some Notes:

In the above it should be noted that map1 only holds two values because of the way I set up the hashCode() function to split odds and evens. k = 0 and k = 2 therefore have the same hashCode of 0. So when I modify k = 2 and attempt to k -> "c" the mapping k -> "a" gets overwritten--k -> "b" is still there because it exists in a different bucket.

Also there are a lot of different ways to examine the maps in the above code and I would encourage people that are curious to do things like print out the values of the map and then the key to value mappings (you may be surprised by the results you get). Do things like play with changing the different "unique" keys (i.e. k0, k1, and k2), try changing the single key k. You could also see how even the secondProblemFixed isn't actually fixed because you could also gain access to the keys (for instance via Map::keySet) and modify them.

Logistic answered 24/12, 2020 at 20:9 Comment(4)
No, the real problem is that mutating a key usually means that it's in the wrong bucket, so things that match the component equality are looking in the wrong spot. Depending on implementation rebucketing the table (by, say, adding more values) will correct this. Note that the usual advice is to avoid Java's clone() method and the Cloneable interface. Not that it would help in all cases, because there are other ways to mutate stored keys (eg, keySet() or entrySet()).Herold
@Herold I think we're talking about two different things. I'm talking about, I have a mutable key object that I use to place into a map; I then expect that if I mutate the key object back to a previous state, it will give back the same entry (value). I think you're talking about a key object that points to a value and then expecting mutating that same key object will still point to the same value; to me this is an unreasonable expectation and not really the problem with mutable keys.Logistic
... If you mutate a key back that has been inserted again after mutating and call get(), you're not guaranteed to get either value back. Adding with such a key is just a special case of calling get(), because the map essentially does so under the covers to find out if it needs replacement. From the map's point of view, it doesn't know that you're using the same (mutated) reference as a key in the map, as opposed to a separate key with value equality.Herold
@Herold right...see my answer (where I explain that exact situation).Logistic
T
1

I won't repeat what others have said. Yes, it's inadvisable. But in my opinion, it's not overly obvious where the documentation states this.

You can find it on the JavaDoc for the Map interface:

Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map

Thickknee answered 24/12, 2020 at 10:20 Comment(0)
G
0

Behaviour of a Map is not specified if value of an object is changed in a manner that affects equals comparision while object(Mutable) is a key. Even for Set also using mutable object as key is not a good idea.

Lets see a example here :

public class MapKeyShouldntBeMutable {

/**
 * @param args
 */
public static void main(String[] args) {
    // TODO Auto-generated method stub
    Map<Employee,Integer> map=new HashMap<Employee,Integer>();

    Employee e=new Employee();
    Employee e1=new Employee();
    Employee e2=new Employee();
    Employee e3=new Employee();
    Employee e4=new Employee();
    e.setName("one");
    e1.setName("one");
    e2.setName("three");
    e3.setName("four");
    e4.setName("five");
    map.put(e, 24);
    map.put(e1, 25);
    map.put(e2, 26);
    map.put(e3, 27);
    map.put(e4, 28);
    e2.setName("one");
    System.out.println(" is e equals e1 "+e.equals(e1));
    System.out.println(map);
    for(Employee s:map.keySet())
    {
        System.out.println("key : "+s.getName()+":value : "+map.get(s));
    }
}

  }
 class Employee{
String name;

public String getName() {
    return name;
}

public void setName(String name) {
    this.name = name;
}

@Override
public boolean equals(Object o){
    Employee e=(Employee)o;
    if(this.name.equalsIgnoreCase(e.getName()))
            {
        return true;
            }
    return false;

}

public int hashCode() {
    int sum=0;
    if(this.name!=null)
    {
    for(int i=0;i<this.name.toCharArray().length;i++)
    {
        sum=sum+(int)this.name.toCharArray()[i];
    }
    /*System.out.println("name :"+this.name+" code : "+sum);*/
    }
    return sum;

}

}

Here we are trying to add mutable object "Employee" to a map. It will work good if all keys added are distinct.Here I have overridden equals and hashcode for employee class.

See first I have added "e" and then "e1". For both of them equals() will be true and hashcode will be same. So map sees as if the same key is getting added so it should replace the old value with e1's value. Then we have added e2,e3,e4 we are fine as of now.

But when we are changing the value of an already added key i.e "e2" as one ,it becomes a key similar to one added earlier. Now the map will behave wired. Ideally e2 should replace the existing same key i.e e1.But now map takes this as well. And you will get this in o/p :

 is e equals e1 true
{Employee@1aa=28, Employee@1bc=27, Employee@142=25, Employee@142=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : one:value : 25

See here both keys having one showing same value also. So its unexpected.Now run the same programme again by changing e2.setName("diffnt"); which is e2.setName("one"); here ...Now the o/p will be this :

 is e equals e1 true
{Employee@1aa=28, Employee@1bc=27, Employee@142=25, Employee@27b=26}
key : five:value : 28
key : four:value : 27
key : one:value : 25
key : diffnt:value : null

So by adding changing the mutable key in a map is not encouraged.

Gunther answered 8/11, 2016 at 16:52 Comment(0)
F
0

To make the answer compact: The root cause is that HashMap calculates an internal hash of the user's key object hashcode only once and stores it inside for own needs.

All other operations for data navigation inside the map are doing by this pre-calculated internal hash.

So if you change the hashcode of the key object (mutate) it will be still stored nicely inside the map with the changed key object's hashcode (you could even observe it via HashMap.keySet() and see the altered hashcode).

But HashMap internal hash will not be recalculated of course and it will be the old stored one and the map won't be able to locate your data by the provided mutated key object new hashcode. (e.g. by HashMap.get() or HashMap.containsKey()).

Your key-value pairs will be still inside the map but to get it back you will need that old hash code value that was given when you put your data into the map.

Notice that you also will be unable to get data back by the mutated key object taken right from the HashMap.keySet().

Fluoridate answered 3/3, 2021 at 20:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.