GetHashCode() with string keys
Asked Answered
D

5

15

Hey all, I've been reading up on the best way to implement the GetHashCode() override for objects in .NET, and most answers I run across involve somehow munging numbers together from members that are numeric types to come up with a method. Problem is, I have an object that uses an alphanumeric string as its key, and I'm wondering if there's something fundamentally wrong with just using an internal ID for objects with strings as keys, something like the following?


// Override GetHashCode() to return a permanent, unique identifier for
// this object.
static private int m_next_hash_id = 1;
private int m_hash_code = 0;
public override int GetHashCode() {
  if (this.m_hash_code == 0)
    this.m_hash_code = <type>.m_next_hash_id++;
  return this.m_hash_code;
}

Is there a better way to come up with a unique hash code for an object that uses an alphanumeric string as its key? (And no, the numeric parts of the alphanumeric string isn't unique; some of these strings don't actually have numbers in them at all.) Any thoughts would be appreciated!

Duda answered 23/7, 2010 at 17:31 Comment(0)
P
23

You can call GetHashCode() on the non-numeric values that you use in your object.

private string m_foo;
public override int GetHashCode()
{
    return m_foo.GetHashCode();
}
Pushed answered 23/7, 2010 at 17:34 Comment(6)
But what if that string changes? For example, I might create a new User object with: User foo = new User(); and the constructor sets User.Id = "". Later, if I say User.Id = "A12345"; and I return this.Id.GetHashCode() as the result of foo.GetHashCode(), won't it have changed, violating the principle that an object's hash code should never change?Duda
@King - there are a couple of different ways to use hash codes. The value of the hash code needs to always be the same given the same starting value. If your value is mutable, you need to store the resulting hashcode and return it instead when GetHashCode() is called.Pushed
@Hans - that's debatable, and really depends on what the object is used for. For example, if you are using this object in a hash table, the hash code cannot change if the value changes. However, the same starting value must always produce the same hash code.Pushed
@King Skippus: The reason that hash codes should not change is that they help identify an object. But if that object's fields change, it is not the same object. It may be the same instance, but it is not longer the same data. You wouldn't expect Equals() to return true for two different instances that had different key fields? So why try to prevent the hash value from changing? You can certainly store the hash, but if the reason that you're doing so is to store a mutable type as a key in a dictionary, then the correct resolution is to avoid that.Rissa
I'm not arguing, just trying to understand. ;) I think I get it, and maybe the way to go is to not allow a default constructor on my class (i.e. <i>require</i> the Id to always be set) and make the Id property read-only so that it is immutable. There seems to be a consensus that Jon's on the money, and if I make the Id immutable per LBushkin's comment, then it's sounds like that's the way to go.Duda
@JonB That is not true, though it contains truth. If an object is a key in a hash table, then the object must not change. If it does it'll be located internally to the table according to its previous hash, so whether the hash remained the same or not it would not be found (except by a fluke chance that decreased according to the size of the table). That the object must not change means that the hash must not change, but it's the former that's important not the latter.Duffer
R
21

This is not a good pattern for generating hashes for an object.

It's important to undunderstand the purpose of GetHashCode() - it's a way to generate a numeric representation of the identifying properties of an object. Hash codes are used to allow an object to serve as a key in a dictionary and in some cases accelerate comparisons between complex types.

If you simply generate a random value and call it a hash code, you have no repeatability. Another instance with the same key fields will have a different hash code, and will violate the behavior expected by classes like HashSet, Dictionary, etc.

If you already have an identifying string member in you object, just return its hash code.

The documentation on MSDN for implementers of GetHashCode() is a must read for anyone that plans on overriding that method:

Notes to Implementers

A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object. Hash functions are usually specific to each Type and, for uniqueness, must use at least one of the instance fields as input.

A hash function must have the following properties:

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

For the best performance, a hash function must generate a random distribution for all input.

For example, the implementation of the GetHashCode method provided by the String class returns identical hash codes for identical string values. Therefore, two String objects return the same hash code if they represent the same string value. Also, the method uses all the characters in the string to generate reasonably randomly distributed output, even when the input is clustered in certain ranges (for example, many users might have strings that contain only the lower 128 ASCII characters, even though a string can contain any of the 65,535 Unicode characters).

Rissa answered 23/7, 2010 at 17:38 Comment(0)
A
2

Hash codes don't have to be unique. Provided your Equals implementation is correct, it's OK to return the same hash code for two instances. The m_next_hash_id logic is broken, since it allows two objects to have different hash codes even if they compare equals.

MSDN gives a good set of instructions on how to implement Equals and GetHashCode. Several of the examples here implement GetHashCode in terms of the hash codes of an object's fields

Abstergent answered 23/7, 2010 at 17:37 Comment(0)
P
0

Yes, a better way would be to use the hashcode of the string you already have. If the alpha numeric string defines the identity of the object you have, it's hashcode will do quite nicely for the hashcode of your object.

The idea of incrementing a static field and using it as the hashcode, is a bad one. The hash code should have an even distribution across the space of possible values. This ensures, amongst other things, that it will perform well when used as the key in a hashtable.

Pelayo answered 23/7, 2010 at 17:35 Comment(0)
V
0

I believe you generally want GetHashCode() to return something that identifies the object by it's value, rather than it's instance, if I'm understanding the idea here, I think your method would ensure GetHashCode() on two different objects with equivalent values would return different hashes just because they're different instances.

GetHashCode() is meant to return a value that lets you compare two objects values, not their references.

Vociferous answered 23/7, 2010 at 17:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.