Are we guaranteed that hashcode caching through data race will work correctly?
Asked Answered
G

3

7
public class TestHashRace
{
    private int cachedHash = 0;
    private readonly object value;

    public object Value
    {
        get { return value; }
    }

    public TestHashRace(object value)
    {
        this.value = value;
    }

    public override int GetHashCode()
    {
        if (cachedHash == 0) {
            cachedHash = value.GetHashCode();
        }
        return cachedHash;
    }

    //Equals isn't part of the question, but since comments request it, here we go:
    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != GetType()) return false;
        return Equals((TestHashRace) obj);
    }

    protected bool Equals(TestHashRace other)
    {
        return Equals(value, other.value);
    }
}

Here's simple test class.

Are we guaranteed that GetHashCode will always return the same value? And if we are, can someone point to some reference material which gives us this guarantee?

We aren't worried if it calculates hashcode for our value more than once, we just want to be sure that the returned value will always be the same.

Our class has to be immutable, while the cachedHash field is mutable. Field cannot be volatile for performance reasons (the whole idea of this question and optimization we are questioning here). Value is immutable. And it has to be thread-safe.

We can live with potential hashcode recalculation when it will be 0 for some specifc values. We do not want to use nullable types or add additional fields for memory reasons (less memory used if we store only 1 int), so it has to be one int field to handle hashcode problem.

Gulosity answered 24/11, 2016 at 18:27 Comment(11)
It's not clear what you're trying to do here. You have no way of knowing when value will change - anything holding an external reference of it could potentially modify it, rendering your cached hash value stale. Do you want the "same" hash code or do you want the "correct" hash? The only way this can work is if the value object can notify the containing class if has been modified in a way that will alter its hash (or you need some way to guarantee that value will not be externally changed - store a clone or deep copy, etc). Does this need to be threadsafe? Too many questions here...Unite
Overriding GetHashCode without overriding Equals will lead to errors. You need to show the equals implementation tooCoach
@Unite edited question to meet your requirementsGulosity
"Field cannot be volatile for performance reasons" what are these performance reasons? Have you verified that this would be a bottleneck actually?Goeselt
Your equals is wrong. If you get True for equals you must also get the same value with GetHashCode from both objects. Also your equals is currently infinitely recursive because you don't have a Equals overload that takes in a TestHashRaceCoach
See also https://mcmap.net/q/1604473/-what-would-i-do-with-a-net-object-hashcodeGyasi
@Goeselt I am not saying that we got a performance problem. I am asking if removing volatile in this code will work correctly in C#.Gulosity
@ValentinKuzub Well, you're the one who said "performance reasons". I'm only asking what are these reasons.Goeselt
Fyi, volatile has no performance impact in this case on x86. The instructions don't change. I don't see what JIT optimization would be inhibited by it (here). But it also has no correctness impact.Wallaroo
@Wallaroo sounds very interesting, can you elaborate more? Some links, references, proof? This kind of answer is what I am looking for.Gulosity
Under the ECMA spec synchronization is required here for correctness. But no real world JIT would break this. I'd compare the machine code that the JIT generates with and without volatile (feel free to post it). Release mode, x64, make the debugger not prevent opimizations. I don't know why there would be a difference.Wallaroo
H
6

Are we guaranteed that GetHashCode will always return the same value?

No. The guarantee holds only for immutable value objects with properly implemented GetHashCode method. Mutable objects may change their hash code when their content has been mutated (which is the reason why mutable objects should not be used as hash keys).

This is true even if TestHashRace itself is immutable, because you can do this:

var evil = new StringBuilder("hello");
var thr = new TestHashRace(evil);
RunConcurrentCode(thr);
evil.Append(", world!");

If multiple threads in RunConcurrentCode start the call to thr's GetHashCode at the same time, and then complete on different sides of Append, the number returned from value.GetHashCode may be different.

[Edit:] Value is immutable

Then the only thing required for the guarantee to hold is that value's GetHashCode is properly implemented, i.e. does not use random stuff etc.

Note: Since zero is a legitimate value for hash code, your code may repeatedly call value's GetHashCode when the actual code is zero. One approach to fix this would be using nullable cachedHash:

int? cachedHash;
...
public override int GetHashCode() {
    return cachedHash ?? (cachedHash = value.GetHashCode());
}
Hoxsie answered 24/11, 2016 at 18:35 Comment(5)
Thank you. I've added a few edits to the question. It's natural to say that we are guaranteed the same value, but where does this guarantee come from? Some proof, reference or something would be great.Gulosity
@ValentinKuzub It comes from immutability of the object, and the assumption that GetHashCode behaves like a pure function, i.e. returns the same value for the same combination of input parameters. In this case input parameters are the state of the object. From Microsoft: "The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method."Hoxsie
"In this case input parameters are the state of the object." - but the state of our object can be different in our case, hashcode changes state of the object. It tries to appear like a pure function, but it isn't pure. Hence the question.Gulosity
@ValentinKuzub Your object is fine, it has no say in what's returned. It's value object that's in question.Hoxsie
It is absolutely fine to use mutable objects as keys as long as their GetHashCode is not overridden.Including
C
3

No it won't because 0 is a valid result for value.GetHashCode(). Make cacheedHash a nullable int and check for null instead of 0.

Coach answered 24/11, 2016 at 18:39 Comment(0)
I
-1

There is no guarantee, because you may go and implement a class with a GetHashCode method doing arbitrary stupid things. The compiler will not prevent you from that.

The other question is can you expect that GetHashCode always returns the same value. The answer to that question is yes, mostly. It is a design decision. However, for most classes, the ability to use instances as a key in a dictionary is important enough to implement GetHashCode in a way that the value never changes, for example by not overriding it, or only overriding it to save the reflection costs.

Notably, this includes StringBuilder, so that the race condition noted by dasblinkenlight does not really exist: Unlike String, StringBuilder will always return the same hash code.

So why mostly? The answer to this is a bit awkward. Technically, the class string is not immutable. There are some evil (i.e., unsafe) ways to modify the contents of a string without changing the reference, which in turn will lead to different hash codes for that same reference. You also will find a number of people implementing a value-like Equals and GetHashCode for classes that suffer from the same problem (and you do not have to use unsafe code to get in trouble).

Thus, there is no guarantee, but it is a fair assumption to make. Just document that assumption so users of your code do not run into trouble and you should be fine.

Including answered 29/11, 2016 at 19:35 Comment(1)
Why downvote? Please leave a comment if you think that something is not correct.Including

© 2022 - 2024 — McMap. All rights reserved.