C# how to calculate hashcode from an object reference
Asked Answered
S

3

21

Folks, here's a thorny problem for you!

A part of the TickZoom system must collect instances of every type of object into a Dictionary<> type.

It is imperative that their equality and hash code be based on the instance of the object which means reference equality instead of value equality. The challenge is that some of the objects in the system have overridden Equals() and GetHashCode() for use as value equality and their internal values will change over time. That means that their Equals and GetHashCode are useless. How to solve this generically rather than intrusively?

So far, We created a struct to wrap each object called ObjectHandle for hashing into the Dictionary. As you see below we implemented Equals() but the problem of how to calculate a hash code remains.

public struct ObjectHandle : IEquatable<ObjectHandle>{
    public object Object;
    public bool Equals(ObjectHandle other) {
        return object.ReferenceEquals(this.Object,other.Object);
    }
}

See? There is the method object.ReferenceEquals() which will compare reference equality without regard for any overridden Equals() implementation in the object.

Now, how to calculate a matching GetHashCode() by only considering the reference without concern for any overridden GetHashCode() method?

Ahh, I hope this give you an interesting puzzle. We're stuck over here.

Sincerely, Wayne

Seemaseeming answered 31/5, 2010 at 16:28 Comment(0)
S
26

RuntimeHelpers.GetHashCode() does exactly what is needed here.

Seemaseeming answered 31/5, 2010 at 17:44 Comment(6)
is returned value constant during lifetime of object? if yes, then this is really nice thing.Jinni
Yes. It's constant. The GC keeps a handle to the object. The actual location of it can be moved around. So it hashes the handle to it.Seemaseeming
@Wayne: One might expect things to work that way, but they don't. Each object's header contains 32 bits that may either hold the value that should be returned by RuntimeHelpers.GetHashCode() or an index into a table that includes a number of things including the RuntimeHelpers.GetHashCode() value (certain ranges of integer values are used for different purposes). Object relocation is handled by tracking down and changing every reference to an object that exists anywhere in the universe.Complice
@Wayne: Note that if at some moment in time a field holds the only reference to some object anywhere in the universe (say, an instance of String), there is no observation the code could have made before that point in time which would always be sufficient to determine at some later time whether the field still refers to that same object, rather than referring to a new object containing the same data.Complice
RuntimeHelpers.GetHashCode(obj) returns same thing as GetHashCode(obj)Perdue
@Perdue I'm don't think that's trueShanitashank
J
2

You are breaking the pattern, this leads to questions that can't be solved. Method Equals should compare the contents of objects, instead of comparing references. That is what object.Equals does, why override with same behavior?

Now about GetHashCode. Again, hashcode is hash function applied to contents of object. You can't calculate it by reference only. You could use pointer to object and use it as hash, but in .net objects' addresses can be changed by GC.

Jinni answered 31/5, 2010 at 16:37 Comment(7)
You are correct. But TickZoom does some very advanced and high performance programming techniques ordinarily only seen in C++ rather than C#. In this case we're doing aspect oriented programming to spit out trace of method calls in UML sequence format for Trace2UML tool to automatically generate UML sequence diagrams. The aspect must keep a record of each object it has already seen so that when it sees a new one it can emit the text command to draw that instance on the diagram. So it's irrelevant the contents of the object. Simple an internally almost compiler like issue.Seemaseeming
@Wayne: ok. if you want to store something in Dictionary then you must override correctly GetHashcode, otherwise you can't use dictionary. well, you can, but it becomes inefficient. use linked lists then.Jinni
Correct answer is below because the GC has a unique handle to every item but it won't share that with you. .NET provides Equals() and GetHashCode() for the default object if you don't override which are based on that handle and only compares the reference equality. So if you don't override them that's all you get. My need was to use the original default Equals and GetHashCode and .NET offers access to those to get around somebody overriding them with RuntimeHelpers.Equals and RuntimeHelpers.GetHasCode.Seemaseeming
@Seemaseeming i had a feeling that you can get the handle, or hash of handle somehow. now i know that RuntimeHelpers.GetHashCode() can do it :)Jinni
@Wayne: In some frameworks the GC keeps a unique handle for every item which will never change during its lifetime, but that is not true for common implementations of .NET. Every object has a unique address, but when the GC relocates an object it will track down every reference and change it to reflect the new address. When the default GetHashCode is called the first time on an object, the system will arbitrarily select a number and store it in a word where it shares space with some other bits; there's no particular reason to expect that number not to collide with other live objects.Complice
@supercat. Thanks. Does that cause a problem? I think GetHashCode() is expected to have collissions. What is critical is that RuntimeHelpers.Equals() can always give an correct answer of TRUE if comparing an instance to itself and FALSE if comparing to another instance.Seemaseeming
@Wayne: Properly-written code should not rely upon GetHashCode values being unique, but not all code is properly written. Hash collisions using RuntimeHelpers.GetHashCode will be infrequent enough that they should not pose a performance problem in code that has any sane method of handling collisions, but they'll be frequent enough that code which can't handle hash collisions will fail at least occasionally.Complice
P
0

The hash code doesn't have to be unique. (But uniqueness will improve performance).
So - one thing you can do is set the hash code by the type name. All object from the same type will have the same hash code.

Pollitt answered 31/5, 2010 at 16:33 Comment(2)
why use hashcode if it is constant? it will make any Dictionary linked list.Jinni
Correct. MSDN says that hash code should be randomly distributed over the range of value for best performance. Keep in mind it is possible to need to use an object BOTH ways. Hashed by it's values and also, in a different context, hashed by only it's reference. That's the problem in this case. And I found the answer but posted it separately as an answer. Hope you benefit also.Seemaseeming

© 2022 - 2024 — McMap. All rights reserved.