What is the best algorithm for overriding GetHashCode?
Asked Answered
A

22

1676

In .NET, the GetHashCode method is used in a lot of places throughout the .NET base class libraries. Implementing it properly is especially important to find items quickly in a collection or when determining equality.

Is there a standard algorithm or best practice on how to implement GetHashCode for my custom classes so I don't degrade performance?

Anthropomorphosis answered 4/11, 2008 at 20:53 Comment(7)
After reading this question and the article below, i could implement override of GetHashCode. I hope it would be helpful for others. Guidelines and rules for GetHashCode written by Eric LippertRance
"or to determine equality": no! Two objects with the same hashcode are not necessarily equal.Handiwork
@ThomasLevesque You are right, two objects with the same hash code are not necessarily equal. But still GetHashCode() is used in very many implementations of Equals(). That's what I meant with that statement. GetHashCode() inside Equals() is often used as a shortcut to determine inequality, because if two objects have a different hash code they have to be objects that are not equal and the rest of the equality-check doesn't have to executed.Anthropomorphosis
@Anthropomorphosis Usually, both GetHashCode() and Equals() need to look at all fields of both objects (Equals has to do this if it the hashcodes are equal or not-checked). Because of this, a call to GetHashCode() inside Equals() is often redundant and could reduce performance. Equals() may also be able to short circuit, making it much faster - however in some cases the hashcodes may be cached, making the GetHashCode() check faster and so worthwhile. See this question for more.Takao
UPDATE JAN 2020: Eric Lippert's blog located at: learn.microsoft.com/en-us/archive/blogs/ericlippert/…Oni
UPDATE MAR 2020: The link from @RickDavin is correct, but the the article at learn.microsoft.com has bad formatting. Here's the same article on Eric's blog. ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcodeStorybook
Now you can just use HashCode.Combine(field1, field2,...)Supercilious
G
1809

I usually go with something like the implementation given in Josh Bloch's fabulous Effective Java. It's fast and creates a pretty good hash which is unlikely to cause collisions. Pick two different prime numbers, e.g. 17 and 23, and do:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

As noted in comments, you may find it's better to pick a large prime to multiply by instead. Apparently 486187739 is good... and although most examples I've seen with small numbers tend to use primes, there are at least similar algorithms where non-prime numbers are often used. In the not-quite-FNV example later, for example, I've used numbers which apparently work well - but the initial value isn't a prime. (The multiplication constant is prime though. I don't know quite how important that is.)

This is better than the common practice of XORing hashcodes for two main reasons. Suppose we have a type with two int fields:

XorHash(x, x) == XorHash(y, y) == 0 for all x, y
XorHash(x, y) == XorHash(y, x) for all x, y

By the way, the earlier algorithm is the one currently used by the C# compiler for anonymous types.

This page gives quite a few options. I think for most cases the above is "good enough" and it's incredibly easy to remember and get right. The FNV alternative is similarly simple, but uses different constants and XOR instead of ADD as a combining operation. It looks something like the code below, but the normal FNV algorithm operates on individual bytes, so this would require modifying to perform one iteration per byte, instead of per 32-bit hash value. FNV is also designed for variable lengths of data, whereas the way we're using it here is always for the same number of field values. Comments on this answer suggest that the code here doesn't actually work as well (in the sample case tested) as the addition approach above.

// Note: Not quite FNV!
public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = (int) 2166136261;
        // Suitable nullity checks etc, of course :)
        hash = (hash * 16777619) ^ field1.GetHashCode();
        hash = (hash * 16777619) ^ field2.GetHashCode();
        hash = (hash * 16777619) ^ field3.GetHashCode();
        return hash;
    }
}

Note that one thing to be aware of is that ideally you should prevent your equality-sensitive (and thus hashcode-sensitive) state from changing after adding it to a collection that depends on the hash code.

As per the documentation:

You can override GetHashCode for immutable reference types. In general, for mutable reference types, you should override GetHashCode only if:

  • You can compute the hash code from fields that are not mutable; or
  • You can ensure that the hash code of a mutable object does not change while the object is contained in a collection that relies on its hash code.

The link to the FNV article is broken but here is a copy in the Internet Archive: Eternally Confuzzled - The Art of Hashing

Guacharo answered 4/11, 2008 at 20:56 Comment(87)
The algorithm described in the book you mention is infact a little more detailed it especailly describes what to do for different data types of the fields. E.g.: for fields of type long use (int)(field ^ f >>> 32) instead of simply calling GetHashcode. Is long.GetHashCodes implemented that way ?Anthropomorphosis
Yup, Int64.GetHashCode does exactly that. In Java that would require boxing, of course. That reminds me - time to add a link to the book...Guacharo
23 is no good choice, since(as of .net 3.5 SP1) Dictionary<TKey,TValue> assumes good distribution modulo certain primes. And 23 is one of them. So if you have a dictionary with Capacity 23 only the last contribution to GetHashCode influences the compound hashcode. So I'd rather use 29 instead of 23.Desiderata
@Ani: Your implementation allocated several new objects on the heap, so the performance might is lower than with a manual implementation. Whether that's acceptable depends or your type and usage. Check some of the other answers for helpers using generics which avoid this problem.Desiderata
@CodeInChaos: Only the last contribution influences the bucket - so it might, at worst, have to look through all 23 entries in the dictionary. It's still going to check the actual hash code of each entry, which will be cheap. If you've got a dictionary that small, it's unlikely to matter much.Guacharo
@Jon: I have to ask, despite already having opened my own question on the topic, but what's a good VB version of this since VB lacks the checked and unchecked keywords? I tried making tmpHash an Int64 and AND'ing the lower 8 bits (per the accepted answer to my question), but on a sufficiently-large enough set of fields, it somehow caused the calculation to wrap to 0 for the remainder of the loop.Bracken
@Kumba: I don't know how I'd do this in VB, I'm afraid. Is arithmetic always checked in VB? Could you have a separate class library which you could delegate the arithmetic to, either written in C# or with checked arithmetic turned off project-wide?Guacharo
@Jon: VB apparently checks a lot of things. It's got a fetish for demanding that unsigned numerics get converted into signed numerics before letting you left or right shift them. Which is driving me up the wall and across the ceiling. I'm trying to implement the Jenkins hash to work around the lack of checked/unchecked (a rotating hash also does the trick, but I'm worried about hash collisions with input). Using a separate, C# library is something I'd like to avoid, because it essentially admits defeat. If I get to that point, I should just re-write the entire project in C#.Bracken
isn't the 'unchecked' unnecessary b/c CLR will happily overflow by default?Kosygin
@pomeroy: It depends on what the project settings are. Basically you give the assembly a default context of checked or unchecked.Guacharo
@pomeroy: VB isn't as granular as C# is. Because it lacks the above-two keywords, your only options are to remove nteger overflows for the entire project or not. I guess if your project is complete and generally well tested, removing the overflow checks is a safe thing to do. However, while building it and debugging, those checks are good because they'll highlight bugs to fix. I opened Connect Ticket # 636564 with Microsoft to recommend including checked/unchecked keyword support in the next .NET release. Uncertain if they'll do so, though.Bracken
I'll add that I'll have to go with the rotating hash algorithm linked to in Jon's answer above. It doesn't overflow, even on an Int32, doesn't (so far) wrap to 0 on a large number of fields in the calculation, and is simple and rather speedy. The Jenkins hash didn't work out...Even that overflows at random, depending on the input. Also, the forcing of bitshifts happening in signed math get in the way of a lot of things. I might open another bug on that, unless that's intended behavior somehow.Bracken
Don't you need override in your method declaration? Would also be good to put in null checks since this is such a well-used example.Innuendo
@Rory: I've added the override, thanks - I'm not going to put in the null checks, as I feel that would obscure the important points. The comment suffices IMO.Guacharo
Why start with a prime instead of just zero? does int hash = 17; have any theoretically supported benefits?Osculation
@FredOverflow: I don't know the exact details of all the reasons behind it, but starting with 0 would mean that the hash would stay as zero if the individual field hashes were zero... and that's probably not uncommon (e.g. an integer of value zero will probably hash to zero). Just a guess, but I suspect that having a constant which gets propagated with each field is useful. This is really just copied from Effective Java though :)Guacharo
@JonSkeet How collision safe would this algortihm be for an complex object graph consisting of let's say 500 objects with each having 10 properties. Related question: #5308557Anthropomorphosis
@bitbonk: The chances of a collision on any single change would be pretty low... but in the question you're talking about I'd probably use a cryptographic hash instead.Guacharo
Then the questions stands, how do I create a cryptographic hash on an object model?Anthropomorphosis
@bitbonk: I would strongly consider using a "normal" cryptographic hash on the result of binary serialization of form.Guacharo
This algorithm is basically the DJB2 string hashing algorithm, for which the constants 5381 and 33 are recommended (cse.yorku.ca/~oz/hash.html). To be honest I'm not sure the constant makes much difference, but the multiplier is important.Forester
@JonSkeet I realize I'm raising the dead here, but implementing hashes is a new thing for me. In your implementation, what fields am I including in the hash? Only immutable ones, or are any fields good?Addie
@KChaloux: That entirely depends on what you want equality to mean. Normally it's a bad idea to include mutable data though.Guacharo
How would you handle nullity? If simply ignoring that field, then for A = null, B = "ss" and for A = "ss", B = null we would have colisions. Is't better to multiply each field with different prime number?Wivinia
@Vajda: I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field.Guacharo
@jnm2: I don't follow your argument, to be honest. In particular, I've just tried this effectively hashing 10 fields - and changing the value of just the first field still changed the hash, contradicting your claim that "every bit of the first hash codes will be lost".Guacharo
You can demonstrate this gives a poor distribution quite simply. Take this variant of FNV and apply it to strings (use unsafe pointer manipulation to get integers at a time to give it a fair chance). Use it to add strings to a power-of-two based hash table. With the one I'm working on now, if I generate "1", "2", ... "999999" and add them in, it takes about 34 seconds. Now take this same hash method and re-hash the result with a well-distributed hash. With a good hash this can only make things worse (more time spent, and we could introduce new collisions but never remove them). With the...Nourish
... same hash table I'm working on, the same code to generate "1"..."999999" and add them takes 1 second. The effect is less pronounced with prime-number based hashes, so it that case the extra time spent re-hashing (and perhaps the reduction in possible outcomes, though that's unlikely) doesn't gain anything, but the poor performance on power-of-two tables demonstrates poor distribution generally.Nourish
@JonHanna: Thanks for that. Not sure what you mean by "to get integers at a time" but I'll try to have a closer look. I still like this as a first approximation for a hash, but if you have another hash which is as simple to remember and get right but with a better distribution, I'd be very happy to change my practice :)Guacharo
I meant I used fixed(char* ptr = str){int* iPtr = (int*)ptr;... but I also tried just doing foreach(char c in str) and casting each char to int, and the same applies. The relative weakness became apparent to me when I had reason to use power-of-two tables and was getting poor results (I used to use much the same as above, myself). The solution I finally hit upon is to forget about being easy to remember, and to build a hard-to-remember method once, and then make it easy to use and put it the code at nuget.org/packages/SpookilySharp I'll add a full answer here at lunchtime.Nourish
@JonSkeet and now answered.Nourish
@JonHanna: Thanks for that. Will have to look in more detail when I have a bunch of time :)Guacharo
I think is important to point out that we should be careful with the Hash code changing at run-time. We had a bug in my project because the previous developer had implemented a GetHashCode algorithm based in this answer. But in his implementation he had a list of objects, he used the hash of each item in the collection to generate the object hash code. Therefore, when the collection changed, the hash code changed also. That was causing binding problems in WPF. And if you had the object in a dictionary for example, you would get errors also.Malpractice
@Dzyann: Yes, mutating the key in a way which affects equality - and thus the hash code - it always a bad idea. I'll add a note.Guacharo
@JonSkeet you are right, and it can lead to very hard to track bugs. Like in this case with the bindings of WPF. It took ages until one of my coworkers found the cause of it and solved it. Since It wasn't our code it was very challenging.Malpractice
I would suggest that you change 17 and 23 to the constants here. (Thanks for the link.) It gave a simple dictionary lookup much higher performance, in my case, ~60% better.Kosel
@jnm2: That's not the same algorithm to start with - it's using XOR rather than ADD. I'll stick with these constants for this answer, but perhaps you should add your own answer?Guacharo
Actually, I was going to suggest that xoring rather than adding wouldn't diminish the simplicity as a go-to hash algorithm. What do you think?Kosel
XOR ends up making GetHashCode() faster by 12% in my case.Kosel
@jnm2: Well it wouldn't diminish that simplicity - but it's not what I've done for the past several years. I'll add FNV as an alternative though.Guacharo
int hash = 2166136261; Is there a cast missing? Compiler says that 2166136261 is a uint... I changed it to int hash = (int)2166136261;Bookcraft
I tried to implement this approach for ValueUtils, but in my testing this variant of FNV caused considerable collisions (24%) in certain symmetrical datasets. And perhaps that's because this is NOT actually the FNV hash? Tradutional FNV hashes per octet (byte), not per 32-bit word. That gives this variant less opportunity to mix this bits around...Chair
@EamonNerbonne: What do you mean by "this approach"? The answer now contains two different versions...Guacharo
I mean this variant of FNV - it's not quite FNV, and I'm pretty sure that makes matters worse. Incidentally, I also tried the h=prime; repeat h=h*prime + ? recipe; that seems to vary; it does quite well for larger primes, especially if your intermediate is 64-bit wide.Chair
@Eamon: I don't know enough about the theory to comment further, I'm afraid :(Guacharo
Yeah, the theory behind it is not at all obvious to me. However - this answer suggests that this implementation is FNV, a well-known good hash. But that's not really true, since this is not FNV. Also, FNV is a string hash algorithm, which needs to satisfy much trickier requirements in that it needs to work for variable length potentially long strings. But again, the algorithm currently in the answer is not FNV - it mixes the bits much less well.Chair
@EamonNerbonne: Okay. I'll edit to indicate that it's a modification, and that it doesn't work as well in at least some cases.Guacharo
@EamonNerbonne: What are the best coefficients that you know of?Kosel
@Kosel In my experiments, the offset matters little, and the trend is that larger primes perform better, with the caveat that this is all tricky to test because it's slow (really slow) to be thorough, and it depends on the way in which your dataset is "messed up". If your fields have perfectly randomly distributed hashcodes - none of this matters, but of course, in the real world, those hash codes aren't random, and fields are correlated. There's a fairly good reason why large primes would be better too - they mix bits better, especially if your data consists largely of small numbers.Chair
@jnm2, so I'd pick a largish prime (on the order of 2^16, say) and to tune for .NET's dictionary implementation, one that's NOT used by Dictionary<,>: referencesource.microsoft.com/#mscorlib/system/collections/…Chair
@Kosel I came across these two questions further exploring this issue: #1836476 and #1145717 and both conclude: use any old large prime. The accepted answer in the first question mentions two chosen in a principled way - but it's not likely a principle that really relates to the real world, so it still recommends the basic idea: pick a large prime, NOT 23 or 31.Chair
BTW: note that the offset is (as far as I can tell) completely pointless. Distributive laws also hold under modulo, and that means it's simply an identical offset all objects will share - that certainly has no impact on any hashtable I I know.Chair
@EamonNerbonne: I guess that's true if all the objects are of the same type. If you've got a dictionary where some keys are subclasses of other keys, that could make a difference... although only when the extra field values are 0 anyway. Again, this is mostly just habit for me :(Guacharo
@JonSkeet Yeah, if you have objects of differing type and you use differing offsets, you'll have some advantage. No reason to be prime, though, I think... In any case, an addition is so cheap, there's not much reason to avoid it either.Chair
I used this algorithm for a pseudo-random generator, and it behaves a little strangely: #26847762Nose
If you got the number 486187739 from https://mcmap.net/q/30342/-what-is-a-sensible-prime-for-hashcode-calculation - I actually intended to recommend 92821.Clyburn
As each instance of class 'object' has a unique hash code, this idea came to my mind that it would be good if we use base.GetHashCode() as a seed or something to produce our hash code for the object.Lye
@AhmadSiavosh: No, that's a bad idea, because you want distinct but equal objects to have the same hash code. (I don't think object.GetHashCode is guaranteed to be unique, either. It may well be "very unlikely to collide" but that's not the same thing.)Guacharo
If a fieldLis a List<obj>, it will work by just doing hash = ... ^ fieldL.GetHashCode() or I have to go through the items like foreach(){hash = ... ^ item.GetHashCode()}???Clinician
@Jaider: It won't do either. List<T> doesn't override Equals or GetHashCode.#Guacharo
I tried this code for 3 doubles, and I get an enormous amount of collisions. I need to get hashcodes for 4194304 tuples. Is there a better way? Using some larger primes helped a little bit, but I am still getting collisions.Paleoasiatic
@user984444: Well you should expect quite a few collisions with that many entries. Just how many are you getting?Guacharo
@JonSkeet It's hard to say. I am using this to cache the output of some perlin noise, and the indicator of collision is some "intersting" output in my image; It has the appearance of... when you win solitaire. This gets mitigated (and the patern changes) with larger primes. Very unhelpful I know. I have changed my struct (tuple of doubles as the key) into a class so that net takes care of the hashcode for me, and it has no collisions anymore.Paleoasiatic
@user984444: Um, that way equal keys won't be equal though, unless you've overridden GetHashCode in your class, in which case you've got the same problem. Maybe it would be worth you asking a new question with all the details...Guacharo
@JonSkeet: Not true; the default implementation of GetHashCode is working perfectly well (If it wasn't, it would be incredibly obvious in my end result). It works for a struct as well, but is WOEFULLY SLOW. I wanted to use structs, but using a class seems to work just fine for my use case.Paleoasiatic
@user984444: Unless you override GetHashCode and Equals yourself, or inherit from another class which does so, you will get reference equality. That's not what the struct will give you. It really, really sounds like we need a new post here with details.Guacharo
@JonSkeet: I consider my particular problem solved because I am getting the desired result, but if I get a chance, I'll post a question with details so you can see what is going on.Paleoasiatic
being really picky, StyleCop default settings generates a warning for this code (SA1407) as you have not used parentheses to determine the precedence of the arithmetic operators, even though it is clean to any dev reading the code, and the compiler as we all know the BODMAS rule.Cowes
@MikeW: I don't think BODMAS includes XOR :) I think the final piece of code would be clearer with parentheses - will add them now. I agree that the multiply-and-add version doesn't need them.Guacharo
For future readers: Consider using HashCode.Combine()Cynthiacynthie
@JonSkeet any idea how to go about this in t-sql ? i need c# hash of guid series to match t-sql hash of uniqueidentifier series. but afaik in t-sql it's not possible to wrap results of integer-arithmetic.Sapota
@BaltoStar: I don't know anything about hashing in T-SQL. If it already provides well-defined hashing for GUID values, I'd probably try to emulate that in C# rather than the other way round.Guacharo
@JonSkeet in C# why not just MD5 hash the ordered concatenation of the GUIDs ?Sapota
@JamesKo: I'll add a link to HashCode.Combine when .NET Core 2.1 has actually been released, and I can link to docs. I don't think it'll be useful to many people until that time.Guacharo
@JonSkeet Sure.Pathe
I'm unsure how to handle the nulls here. None of the answers really seem to touch that topic, just assuming we are all experts on the topic. @JonSkeet Mentions in these comments "I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field." How that is actually implemented though, I have questions about. It sounds like you are saying that a null property should zero out the current hash value, but that seems like odd behavior. It might be obvious to some what to do, but I'd appreciate an example showing how to handle the nulls, or a better explanation.Lost
After reading a few other Q&A's on the topic, I realized that I didn't comprehend what @JonSkeet was saying very well. I misunderstood him to be saying that I should substitute 0 as the hashing constant when the property is null. After seeing an example here, I realize he was merely stating that I should substitute 0 as the property's hash code, which seems so obvious now...considering that is exactly what he said.Lost
Does it really need to use primes like 17 or 23 if my object's hash only depends on one int32 property? Can I just return MyProperty.GetHashCode()?Pierce
@stt106: For just a single property, I'd just return that property's hash code, yes.Guacharo
FYI, Visual Studio 2017 can generate GetHashCode() without ReSharper. learn.microsoft.com/en-us/visualstudio/ide/reference/…Johanajohanan
Why multiply hash on every line? Why: int hash = 17; hash = hash * 23 + ... ? Why not just use the product explicitly, as in, hash = 391 + field1.GetHashCode(); ? Since order of operations will do the multiplication first anyway?Polygyny
@emery.noel: That won't make any difference after the first line (you'd still need to multiply, to include the previous hash) and there's a big benefit IMO in making every line consistent.Guacharo
Not much notice has been taken of an important point. It is Important that the returned hash code NOT CHANGE if the object is mutable and the object changes. This is because the hash code is used (for example) to place objects in dictionaries. If a mutable object changes after insertion in a dictionary then the object is not found when you go to look it up. The above should cache the hash the first time it is computed and always return the original value. Otherwise you will have strange bugs.Absorbing
@Tb.: Or you document it, as per the docs: "If you do choose to override GetHashCode() for a mutable reference type, your documentation should make it clear that users of your type should not modify object values while the object is stored in a hash table." Often that's a useful thing to do, as you might choose to construct an object but not mutate it afterwards. It's not "squeaky clean" but it can be perfectly practical.Guacharo
The link to the FNV article is broken but I found it in the archive: archive.vn/KJeJyStutman
@JonSkeet Which version of GetHashCode() is better, the one using 17/23 or the one using long numbers? Anyone can help out?Melodiemelodion
@Pingpong: "Better" has any number of axes, and will depend on the source data. I don't even know which one you mean by "the one using long numbers" (there are likely to be several, and I'm not going to trawl over ever answer here finding candidates). I would personally just start with the simple one here - or use HashCode.Combine if you can.Guacharo
S
552

ValueTuple - Update for C# 7

As @cactuaroid mentions in the comments, a value tuple can be used. This saves a few keystrokes and more importantly executes purely on the stack (no Garbage):

(PropA, PropB, PropC, PropD).GetHashCode();

(Note: The original technique using anonymous types seems to create an object on the heap, i.e. garbage, since anonymous types are implemented as classes, though this might be optimized out by the compiler. It would be interesting to benchmark these options, but the tuple option should be superior.)

Anonymous Type (Original Answer)

Microsoft already provides a good generic HashCode generator: Just copy your property/field values to an anonymous type and hash it:

new { PropA, PropB, PropC, PropD }.GetHashCode();

This will work for any number of properties. It does not use boxing. It just uses the algorithm already implemented in the framework for anonymous types.

Slaton answered 7/1, 2011 at 21:38 Comment(15)
Yes, anonymous GetHashCode implementation is very effective (BTW it's the same as the one in the Jon Skeet's answer), but the only problem with this solution is that you generate a new instance at any GetHashCode call. It can be a bit overhead-ish in particular in case of intensive access to big hashed collections...Prefect
This works in VB w/ .NET 4.0, but looking over the IL, it's using box calls since the type uses generics. No unboxing, but from my reading here, the mere presence of boxing suggests this can be a little inefficient. Seems like the only choice for VB, though, since there is not equivalent of checked/`unchecked'.Bracken
@Prefect Good point, I didn't think about the overhead of creating an new object. Jon Skeet's answer is the most efficient and won't use boxing. (@Bracken To solve the unchecked in VB, just use a Int64 (long) and truncate it after the calculations.)Slaton
In VB.Net: New With {PropA, PropB, PropC, PropD}.GetHashCode()Duff
VB.NET must use Key in anonymous type creation: New With {Key PropA}.GetHashCode() Otherwise GetHashCode will not return the same hashcode for different objects with the same 'identifying' properties.Empson
Don't forget to enumerate your IEnumerables or bad things will happen. new { PropA, PropB, C = PropC.ToList() }.GetHashCode()Principe
@Principe in that case, I would consider saving the IEnumerable as a list value somewhere instead of enumerating it each time the hashcode is calculated. Caclulating ToList each time inside GetHashCode could hurt performance in many situations.Slaton
And don't forget that private property/fields are not needed in this case ;).Intercross
@Keith: A hash code does not have to be affected by all properties of an object. A hash code merely needs to give good enough distribution of your objects. And should be fast to compute. Leave out the enumerable. And if you have a list, don't include the whole list. Use Count, and perhaps first element (use zero if no elements). unless your class doesn't have much variation other than the list; in that case, caching a hash of the list, as Rick suggests, is best. Recall that, by definition, an object's hash must always be the same. If collection changes, DO NOT include it in hash calc.Tarra
For those who like this, (PropA, PropB, PropC, PropD).GetHashCode() is now available on C#7 without GC pressure @Prefect concerns. Quick and Simple Hash Code CombinationsJohanajohanan
@Johanajohanan Nice! So using a tuple (which is a struct) instead of the anonymous type (a class). Does it still use the same calculation behind the scenes for the Tuple GetHashcode()?Slaton
@RickLove I'm not sure the mathematics. Tuple.GetHashCode() and ValueTuple.GetHashCode() looks similar. ValueTuple.GetHashCode() calls HashHelper. Tuple.GetHashCode() calls Tuple.CombineHashCodes. For anonymous type, How are Equals and GetHashCode implemented on anonymous types?Johanajohanan
I apologize that @Timo has already posted about ValueTuple.GetHashCode() below.Johanajohanan
If nullable, do we need to check PropA, B, etc for null and if null pass in 0?Demeter
How would you do a nullcheck on an optional property here?Egypt
P
154

Using System.HashCode

If you are using .NET Standard 2.1 or above, you can use the System.HashCode struct. On earlier frameworks it is available from the Microsoft.Bcl.HashCode package. There are two methods of using it:

HashCode.Combine

The Combine method can be used to create a hash code, given up to eight objects.

public override int GetHashCode() => HashCode.Combine(this.object1, this.object2);

HashCode.Add

The Add method helps you to deal with collections:

public override int GetHashCode()
{
    var hashCode = new HashCode();
    hashCode.Add(this.object1);
    foreach (var item in this.collection)
    {
        hashCode.Add(item);
    }
    return hashCode.ToHashCode();
}

GetHashCode Made Easy

An alternative to System.HashCode that is super easy to use while still being fast. You can read the full blog post 'GetHashCode Made Easy' for more details and comments.

Usage Example

public class SuperHero
{
    public int Age { get; set; }
    public string Name { get; set; }
    public List<string> Powers { get; set; }

    public override int GetHashCode() =>
        HashCode.Of(this.Name).And(this.Age).AndEach(this.Powers);
}

Implementation

public struct HashCode : IEquatable<HashCode>
{
    private const int EmptyCollectionPrimeNumber = 19;
    private readonly int value;

    private HashCode(int value) => this.value = value;

    public static implicit operator int(HashCode hashCode) => hashCode.value;

    public static bool operator ==(HashCode left, HashCode right) => left.Equals(right);

    public static bool operator !=(HashCode left, HashCode right) => !(left == right);

    public static HashCode Of<T>(T item) => new HashCode(GetHashCode(item));

    public static HashCode OfEach<T>(IEnumerable<T> items) =>
        items == null ? new HashCode(0) : new HashCode(GetHashCode(items, 0));

    public HashCode And<T>(T item) => 
        new HashCode(CombineHashCodes(this.value, GetHashCode(item)));

    public HashCode AndEach<T>(IEnumerable<T> items)
    {
        if (items == null)
        {
            return new HashCode(this.value);
        }

        return new HashCode(GetHashCode(items, this.value));
    }

    public bool Equals(HashCode other) => this.value.Equals(other.value);

    public override bool Equals(object obj)
    {
        if (obj is HashCode)
        {
            return this.Equals((HashCode)obj);
        }

        return false;
    }

    public override int GetHashCode() => this.value.GetHashCode();

    private static int CombineHashCodes(int h1, int h2)
    {
        unchecked
        {
            // Code copied from System.Tuple a good way to combine hashes.
            return ((h1 << 5) + h1) ^ h2;
        }
    }

    private static int GetHashCode<T>(T item) => item?.GetHashCode() ?? 0;

    private static int GetHashCode<T>(IEnumerable<T> items, int startHashCode)
    {
        var temp = startHashCode;

        var enumerator = items.GetEnumerator();
        if (enumerator.MoveNext())
        {
            temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));

            while (enumerator.MoveNext())
            {
                temp = CombineHashCodes(temp, GetHashCode(enumerator.Current));
            }
        }
        else
        {
            temp = CombineHashCodes(temp, EmptyCollectionPrimeNumber);
        }

        return temp;
    }
}

What Makes a Good Algorithm?

Performance

The algorithm that calculates a hash code needs to be fast. A simple algorithm is usually going to be a faster one. One that does not allocate extra memory will also reduce need for garbage collection, which will in turn also improve performance.

In C# hash functions specifically, you often use the unchecked keyword which stops overflow checking to improve performance.

Deterministic

The hashing algorithm needs to be deterministic i.e. given the same input it must always produce the same output.

Reduce Collisions

The algorithm that calculates a hash code needs to keep hash collisions to a minumum. A hash collision is a situation that occurs when two calls to GetHashCode on two different objects produce identical hash codes. Note that collisions are allowed (some have the misconceptions that they are not) but they should be kept to a minimum.

A lot of hash functions contain magic numbers like 17 or 23. These are special prime numbers which due to their mathematical properties help to reduce hash collisions as compared to using non-prime numbers.

Hash Uniformity

A good hash function should map the expected inputs as evenly as possible over its output range i.e. it should output a wide range of hashes based on its inputs that are evenly spread. It should have hash uniformity.

Prevent's DoS

In .NET Core each time you restart an application you will get different hash codes. This is a security feature to prevent Denial of Service attacks (DoS). For .NET Framework you should enable this feature by adding the following App.config file:

<?xml version ="1.0"?>  
<configuration>  
   <runtime>  
      <UseRandomizedStringHashAlgorithm enabled="1" />  
   </runtime>  
</configuration>

Because of this feature, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection and they should never be persisted.

Read more about this here.

Cryptographically Secure?

The algorithm does not have to be a Cryptographic hash function. Meaning it does not have to satisfy the following conditions:

  • It is infeasible to generate a message that yields a given hash value.
  • It is infeasible to find two different messages with the same hash value.
  • A small change to a message should change the hash value so extensively that the new hash value appears uncorrelated with the old hash value (avalanche effect).
Priesthood answered 11/6, 2019 at 8:34 Comment(6)
This is very good answer. As an addition, you could consider changing "speed" to "performance" and adding the property of being allocation-free. The built-in HashCode type satisfies that too.Teetotaler
How does this compare to the ValueTuple.GetHashCode() answer recently updated by @ricklove above?Protecting
The HashCode.Combine is a static method which will not allocate anything, while ValueTuple will start with allocating on the stack.Priesthood
HashCode.Of(this.Name).And(this.Age).AndEach(this.Powers) - that is nice syntax :)Doenitz
they should never be used as key fields in a collection, Isn't that the whole point of hash codes though? And the existence of hash tables, hash sets, dictionaries?Cockleshell
Very well done, but two points currently not mentioned: First, the collection comparer respects order (so [1,2,3] has a different hash code than [3,2,1]), this can be desired, but not always, so it should be made clear at least in the documentation. Second, the enumerable will be completely iterated and the elements are materialized. Can lead to problems if the source is an endless enumeration or elements are created on the fly. Same here, should be mentioned in the docs.Moderation
E
111

Here is my hashcode helper.
It's advantage is that it uses generic type arguments and therefore will not cause boxing:

public static class HashHelper
{
    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
         unchecked
         {
             return 31 * arg1.GetHashCode() + arg2.GetHashCode();
         }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            return 31 * hash + arg3.GetHashCode();
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, 
        T4 arg4)
    {
        unchecked
        {
            int hash = arg1.GetHashCode();
            hash = 31 * hash + arg2.GetHashCode();
            hash = 31 * hash + arg3.GetHashCode();
            return 31 * hash + arg4.GetHashCode();
        }
    }

    public static int GetHashCode<T>(T[] list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    public static int GetHashCode<T>(IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            foreach (var item in list)
            {
                hash = 31 * hash + item.GetHashCode();
            }
            return hash;
        }
    }

    /// <summary>
    /// Gets a hashcode for a collection for that the order of items 
    /// does not matter.
    /// So {1, 2, 3} and {3, 2, 1} will get same hash code.
    /// </summary>
    public static int GetHashCodeForOrderNoMatterCollection<T>(
        IEnumerable<T> list)
    {
        unchecked
        {
            int hash = 0;
            int count = 0;
            foreach (var item in list)
            {
                hash += item.GetHashCode();
                count++;
            }
            return 31 * hash + count.GetHashCode();
        }
    }

    /// <summary>
    /// Alternative way to get a hashcode is to use a fluent 
    /// interface like this:<br />
    /// return 0.CombineHashCode(field1).CombineHashCode(field2).
    ///     CombineHashCode(field3);
    /// </summary>
    public static int CombineHashCode<T>(this int hashCode, T arg)
    {
        unchecked
        {
            return 31 * hashCode + arg.GetHashCode();   
        }
    }

Also it has extension method to provide a fluent interface, so you can use it like this:

public override int GetHashCode()
{
    return HashHelper.GetHashCode(Manufacturer, PartN, Quantity);
}

or like this:

public override int GetHashCode()
{
    return 0.CombineHashCode(Manufacturer)
        .CombineHashCode(PartN)
        .CombineHashCode(Quantity);
}
Edlun answered 4/4, 2010 at 18:26 Comment(9)
No need for T[] separately as it is already IEnumerable<T>Confectionary
You could refactor those methods and restrict the core logic to one functionConfectionary
Incidentally, 31 is a shift and subtract on the CPU, which is exceedingly fast.Falcongentle
An extension method in int is nasty namespace pollution - @safak-gur 's answer below nicely side-steps that problem.Chair
@Edlun you could use params.Buzzer
@ChuiTey This is something all Mersenne Primes have in common.Creative
shouldn't the hash variable start with a non-zero? https://mcmap.net/q/30345/-best-implementation-for-hashcode-method-for-a-collectionFingerling
Just because it's cool, you can also do this with an one-liner: source?.Aggregate(0, (current, item) => unchecked(current * 31 + (item?.GetHashCode() ?? 0))) ?? 0;Brisco
@Buzzer I suggest you dont use params if it is meant for wider use (like a public library). params involve an array allocation (plus the O(n) cost of populating the array) which is bad for perf sensitive situations. params object[] is doubly bad now that you introduce boxing cost as well for value types.Confectionary
P
67

I have a Hashing class in Helper library that I use it for this purpose.

/// <summary> 
/// This is a simple hashing function from Robert Sedgwicks Hashing in C book.
/// Also, some simple optimizations to the algorithm in order to speed up
/// its hashing process have been added. from: www.partow.net
/// </summary>
/// <param name="input">array of objects, parameters combination that you need
/// to get a unique hash code for them</param>
/// <returns>Hash code</returns>
public static int RSHash(params object[] input)
{
    const int b = 378551;
    int a = 63689;
    int hash = 0;

    // If it overflows then just wrap around
    unchecked
    {
        for (int i = 0; i < input.Length; i++)
        {
            if (input[i] != null)
            {
                hash = hash * a + input[i].GetHashCode();
                a = a * b;
            }
        }
    }

    return hash;
}

Then, simply you can use it as:

public override int GetHashCode()
{
    return Hashing.RSHash(_field1, _field2, _field3);
}

I didn't assess its performance, so any feedback is welcomed.

Pernas answered 23/2, 2009 at 11:46 Comment(5)
Well, it will cause boxing, if fields are value types.Edlun
"can be enhanced later by catching the OverflowException" The whole point of the unchecked is to avoid exceptions on overflow which is desired on GetHashCode. So it's not incorrect if the value overflows int and it does not hurt at all.Zorazorah
One issue with this algorithm is that any array full of nulls will always return 0, regardless of it's lengthResection
This helper method also allocates a new object[]Phaedra
As @NathanAdams mentions, the fact that null is skipped entirely could give you unexpected results. Instead of skipping them, you should just use some constant value instead of input[i].GetHashCode() when input[i] is null.Surcease
R
59

Here's my helper class using Jon Skeet's implementation.

public static class HashCode
{
    public const int Start = 17;

    public static int Hash<T>(this int hash, T obj)
    {
        var h = EqualityComparer<T>.Default.GetHashCode(obj);
        return unchecked((hash * 31) + h);
    }
}

Usage:

public override int GetHashCode()
{
    return HashCode.Start
        .Hash(_field1)
        .Hash(_field2)
        .Hash(_field3);
}

If you want to avoid writing an extension method for System.Int32:

public readonly struct HashCode
{
    private readonly int _value;

    public HashCode(int value) => _value = value;

    public static HashCode Start { get; } = new HashCode(17);

    public static implicit operator int(HashCode hash) => hash._value;

    public HashCode Hash<T>(T obj)
    {
        var h = EqualityComparer<T>.Default.GetHashCode(obj);
        return unchecked(new HashCode((_value * 31) + h));
    }

    public override int GetHashCode() => _value;
}

It still avoids any heap allocation and is used exactly the same way:

public override int GetHashCode()
{
    // This time `HashCode.Start` is not an `Int32`, it's a `HashCode` instance.
    // And the result is implicitly converted to `Int32`.
    return HashCode.Start
        .Hash(_field1)
        .Hash(_field2)     
        .Hash(_field3);
}

Edit (May 2018): EqualityComparer<T>.Default getter is now a JIT intrinsic - the pull request is mentioned by Stephen Toub in this blog post.

Relativistic answered 4/9, 2013 at 12:32 Comment(6)
I would change the line with the tertiary operator to be: var h = Equals(obj, default(T)) ? 0 : obj.GetHashCode();Maliamalice
I believe that the ternary operator with obj != null will compile to a box instruction which will allocate memory if T is a value type. Instead you can use obj.Equals(null) which will compile to a virtual call of the Equals method.Macruran
Because this.hashCode != h. It wouldn't return the same value.Remediosremedy
Sorry, manage to remove my comment instead of editing it. Is it more beneficial to create a new struct then change the hashCode to non-readonly and do: "unchecked { this.hashCode ^= h * 397; } return this;" for example?Mixtec
Immutability has its benefits (Why are mutable structs evil?). About performance, what I do is pretty cheap since it does not allocate any space in the heap.Remediosremedy
There is no boxing if you call it like Hash(1) and not like Hash<MyInterface>(myStruct). stackoverflow.com/questions/8823239Breban
H
31

In most cases where Equals() compares multiple fields it doesn't really matter if your GetHash() hashes on one field or on many. You just have to make sure that calculating the hash is really cheap (No allocations, please) and fast (No heavy computations and certainly no database connections) and provides a good distribution.

The heavy lifting should be part of the Equals() method; the hash should be a very cheap operation to enable calling Equals() on as few items as possible.

And one final tip: Don't rely on GetHashCode() being stable over multiple aplication runs. Many .Net types don't guarantee their hash codes to stay the same after a restart, so you should only use the value of GetHashCode() for in memory data structures.

Houphouetboigny answered 23/2, 2009 at 11:55 Comment(5)
"In most cases where Equals() compares multiple fields it doesn't really matter if your GetHash() hashes on one field or on many." This is dangerous advice, because for objects which only differ in the un-hashed fields, you will get hash collisions. If this happens frequently, the performance of hash-based collections (HashMap, HashSet etc.) will degrade (up to O(n) in the worst case).Letrice
This actually happened in Java: In early versions of the JDK String.hashCode() only considered the beginning of the string; this lead to performance problems if you used Strings as keys in HashMaps which only differed at the end (which is common e.g. for URLs). The algorithm was therefore changed ( in JDK 1.2 or 1.3 I believe).Letrice
If that one field 'provides a good distribution' (last part of my answer), then one field is enough.. If it doesn't provide a good distribution, then (and just then) you need another calculation. (E.g. just use another field that does provide a good distribution, or use multiple fields)Houphouetboigny
I don't think there's a problem with having GetHashCode perform memory allocations, provided that it only does so the first time it's used (with subsequent invocations simply returning a cached result). The important thing is not that one should go to great lengths to avoid collisions, but rather that one should avoid "systemic" collisions. If a type has two int fields oldX and newX which frequently differ by one, a hash value of oldX^newX would assign 90% of such records hash values of 1, 2, 4, or 8. Using oldX+newX [unchecked arithmetic] might generate more collisions...Roee
...than would more sophisticated function, but a collection of 1,000,000 things that have 500,000 different hash values will very well if each hash value has two associated things, and very badly if one hash value has 500,001 things and the others have one each.Roee
N
27

Up until recently my answer would have been very close to Jon Skeet's here. However, I recently started a project which used power-of-two hash tables, that is hash tables where the size of the internal table is 8, 16, 32, etc. There's a good reason for favouring prime-number sizes, but there are some advantages to power-of-two sizes too.

And it pretty much sucked. So after a bit of experimentation and research I started re-hashing my hashes with the following:

public static int ReHash(int source)
{
  unchecked
  {
    ulong c = 0xDEADBEEFDEADBEEF + (ulong)source;
    ulong d = 0xE2ADBEEFDEADBEEF ^ c;
    ulong a = d += c = c << 15 | c >> -15;
    ulong b = a += d = d << 52 | d >> -52;
    c ^= b += a = a << 26 | a >> -26;
    d ^= c += b = b << 51 | b >> -51;
    a ^= d += c = c << 28 | c >> -28;
    b ^= a += d = d << 9 | d >> -9;
    c ^= b += a = a << 47 | a >> -47;
    d ^= c += b << 54 | b >> -54;
    a ^= d += c << 32 | c >> 32;
    a += d << 25 | d >> -25;
    return (int)(a >> 1);
  }
}

And then my power-of-two hash table didn't suck any more.

This disturbed me though, because the above shouldn't work. Or more precisely, it shouldn't work unless the original GetHashCode() was poor in a very particular way.

Re-mixing a hashcode can't improve a great hashcode, because the only possible effect is that we introduce a few more collisions.

Re-mixing a hash code can't improve a terrible hash code, because the only possible effect is we change e.g. a large number of collisions on value 53 to a large number of value 18,3487,291.

Re-mixing a hash code can only improve a hash code that did at least fairly well in avoiding absolute collisions throughout its range (232 possible values) but badly at avoiding collisions when modulo'd down for actual use in a hash table. While the simpler modulo of a power-of-two table made this more apparent, it was also having a negative effect with the more common prime-number tables, that just wasn't as obvious (the extra work in rehashing would outweigh the benefit, but the benefit would still be there).

Edit: I was also using open-addressing, which would also have increased the sensitivity to collision, perhaps more so than the fact it was power-of-two.

And well, it was disturbing how much the string.GetHashCode() implementations in .NET (or study here) could be improved this way (on the order of tests running about 20-30 times faster due to fewer collisions) and more disturbing how much my own hash codes could be improved (much more than that).

All the GetHashCode() implementations I'd coded in the past, and indeed used as the basis of answers on this site, were much worse than I'd throught. Much of the time it was "good enough" for much of the uses, but I wanted something better.

So I put that project to one side (it was a pet project anyway) and started looking at how to produce a good, well-distributed hash code in .NET quickly.

In the end I settled on porting SpookyHash to .NET. Indeed the code above is a fast-path version of using SpookyHash to produce a 32-bit output from a 32-bit input.

Now, SpookyHash is not a nice quick to remember piece of code. My port of it is even less so because I hand-inlined a lot of it for better speed*. But that's what code reuse is for.

Then I put that project to one side, because just as the original project had produced the question of how to produce a better hash code, so that project produced the question of how to produce a better .NET memcpy.

Then I came back, and produced a lot of overloads to easily feed just about all of the native types (except decimal†) into a hash code.

It's fast, for which Bob Jenkins deserves most of the credit because his original code I ported from is faster still, especially on 64-bit machines which the algorithm is optimised for‡.

The full code can be seen at https://bitbucket.org/JonHanna/spookilysharp/src but consider that the code above is a simplified version of it.

However, since it's now already written, one can make use of it more easily:

public override int GetHashCode()
{
  var hash = new SpookyHash();
  hash.Update(field1);
  hash.Update(field2);
  hash.Update(field3);
  return hash.Final().GetHashCode();
}

It also takes seed values, so if you need to deal with untrusted input and want to protect against Hash DoS attacks you can set a seed based on uptime or similar, and make the results unpredictable by attackers:

private static long hashSeed0 = Environment.TickCount;
private static long hashSeed1 = DateTime.Now.Ticks;
public override int GetHashCode()
{
  //produce different hashes ever time this application is restarted
  //but remain consistent in each run, so attackers have a harder time
  //DoSing the hash tables.
  var hash = new SpookyHash(hashSeed0, hashSeed1);
  hash.Update(field1);
  hash.Update(field2);
  hash.Update(field3);
  return hash.Final().GetHashCode();
}

*A big surprise in this is that hand-inlining a rotation method that returned (x << n) | (x >> -n) improved things. I would have been sure that the jitter would have inlined that for me, but profiling showed otherwise.

decimal isn't native from the .NET perspective though it is from the C#. The problem with it is that its own GetHashCode() treats precision as significant while its own Equals() does not. Both are valid choices, but not mixed like that. In implementing your own version, you need to choose to do one, or the other, but I can't know which you'd want.

‡By way of comparison. If used on a string, the SpookyHash on 64 bits is considerably faster than string.GetHashCode() on 32 bits which is slightly faster than string.GetHashCode() on 64 bits, which is considerably faster than SpookyHash on 32 bits, though still fast enough to be a reasonable choice.

Nourish answered 14/1, 2014 at 14:15 Comment(7)
When combining multiple hash values into one, I tend to use long values for the intermediate results, and then munge the final result down to an int. Does that seem like a good idea? My concern is that one uses e.g. hash=(hash*31)+nextField, then pairs of matching values will only affect the upper 27 bits of the hash. Letting the calculation extend to a long and wrapping stuff in would minimize that danger.Roee
@Roee it depends on the distribution of your final munging. The SpookilySharp library would ensure that the distribution was good, ideally (because it won't need object creation) by passing a pointer to a blittable type, or passing one of the enumerables it handles directly, but if you don't already have blittable data or a suitable enumeration, then calling .Update() with the multiple values as per the answer above will do the trick.Nourish
@JonHanna would you be willing to be more precise with the problematic behavior you encountered? I'm trying to implement a library that makes implementing value objects trivial (ValueUtils) and I'd love a testset demonstrating poor hash miscibility in power-of-two hashtables.Chair
@EamonNerbonne I don't really have anything more precise than "overall time was slower that way". As I added in an edit, the fact that I was using open-addressing may have been more important than the power-of-two factor. I do plan to do some test cases on a particular project where I'll be comparing a few different approaches, so I may have a better answer for you after that, though that's not a high-priority (a personal project with no pressing need, so I'll get to it when I get to it...)Nourish
@JonHanna: yeah I know how the personal project schedule goes - good luck! In any case, I see I didn't phrase that last comment well: I meant to ask for the problematic input, and not necessarily the details of the problems that resulted. I'd love to use that as a test set (or inspiration for a test set). In any case - good luck with your pet project :-).Chair
I'd bet that your ReHash is a big overkill. I guess, it works well, but it may be even slower than cryptographic hashed which are (sort of) proven to work perfectly. Java also uses power-of-two sized tables and there used to be a rather complicated re-hashing. It's got simplified since tree nodes for collisions were introduced.Decide
@Decide in terms of speed and distribution, it's well demonstrated. I'm now of the opinion that it's more trouble than it's worth for small values. I'd still use the fuller implementation of SpookyHash when it comes to hashing very large values like file contents.Nourish
P
26

As of https://github.com/dotnet/coreclr/pull/14863, there is a new way to generate hash codes that is super simple! Just write

public override int GetHashCode()
    => HashCode.Combine(field1, field2, field3);

This will generate a quality hash code without you having to worry about the implementation details.

Pathe answered 23/11, 2017 at 15:6 Comment(6)
That looks like a sweet addition... any way to know what version of .NET Core that'll ship in?Oakleil
@DanJ What a happy coincidence, the HashCode changes for corefx were merged just a couple of hours before your comment :) The type is slated to ship in .NET Core 2.1.Pathe
That is awesome - and quite the turnaround time. Upvoted. :)Oakleil
@DanJ Even better news-- it should be available right now on the nightly builds of CoreFX hosted on the dotnet-core MyGet feed.Pathe
Sweet - that doesn't help me at work, since we're not quite that bleeding-edge, but good to know. Cheers!Oakleil
Here is a drop-in polyfill package that you can use for everything .NET 4.0+ (System.HashCode included): nuget.org/packages/Gapotchenko.FXNeogothic
B
13

This is a good one:

/// <summary>
/// Helper class for generating hash codes suitable 
/// for use in hashing algorithms and data structures like a hash table. 
/// </summary>
public static class HashCodeHelper
{
    private static int GetHashCodeInternal(int key1, int key2)
    {
        unchecked
        {
           var num = 0x7e53a269;
           num = (-1521134295 * num) + key1;
           num += (num << 10);
           num ^= (num >> 6);

           num = ((-1521134295 * num) + key2);
           num += (num << 10);
           num ^= (num >> 6);

           return num;
        }
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="arr">An array of objects used for generating the 
    /// hash code.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode(params object[] arr)
    {
        int hash = 0;
        foreach (var item in arr)
            hash = GetHashCodeInternal(hash, item.GetHashCode());
        return hash;
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <param name="obj4">The fourth object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and
    /// data structures like a hash table.
    /// </returns>
    public static int GetHashCode<T1, T2, T3, T4>(T1 obj1, T2 obj2, T3 obj3,
        T4 obj4)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3, obj4));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <param name="obj3">The third object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2, T3>(T1 obj1, T2 obj2, T3 obj3)
    {
        return GetHashCode(obj1, GetHashCode(obj2, obj3));
    }

    /// <summary>
    /// Returns a hash code for the specified objects
    /// </summary>
    /// <param name="obj1">The first object.</param>
    /// <param name="obj2">The second object.</param>
    /// <returns>
    /// A hash code, suitable for use in hashing algorithms and data 
    /// structures like a hash table. 
    /// </returns>
    public static int GetHashCode<T1, T2>(T1 obj1, T2 obj2)
    {
        return GetHashCodeInternal(obj1.GetHashCode(), obj2.GetHashCode());
    }
}

And here is how to use it:

private struct Key
{
    private Type _type;
    private string _field;

    public Type Type { get { return _type; } }
    public string Field { get { return _field; } }

    public Key(Type type, string field)
    {
        _type = type;
        _field = field;
    }

    public override int GetHashCode()
    {
        return HashCodeHelper.GetHashCode(_field, _type);
    }

    public override bool Equals(object obj)
    {
        if (!(obj is Key))
            return false;
        var tf = (Key)obj;
        return tf._field.Equals(_field) && tf._type.Equals(_type);
    }
}
Bearded answered 7/10, 2010 at 10:51 Comment(10)
How are the Keys determined? GetHashCode() doesn't take any parameters, so it needs to call this one with two Keys that need to be determined somehow. Sorry, without further explanation this only looks clever, but not that good.Wershba
And why do you need the generic overloads? The type is not important (and not used in your code) since all objects have a GetHashCode() method, so you can always use the method with the params array parameter. Or am I missing something here?Hijoung
It's about performance, avoid the loop for smaller <= 4 fields. But I guess the generics could be skipped and just use object instead.Bearded
When you'd use object instead of generics you'd get boxing and memory allocations, which you don't want in GetHashCode. So generics are the way to go.Desiderata
The trailing shift/xor steps (h += (h << 10); h ^= (h >> 6); h += (h << 3); h ^= (h >> 11); h += (h << 15); have a codesmell: they do not depend on any of the input and look awfully redundant to me.Cornstarch
@Confectionary what speed considerations do you have?Bearded
@Bearded nothing in particular other than the general rule that hashing must be fast. This can't be as fast as I would love it to be. But as I said this gives better distribution of values which may be suitable for some cases.Confectionary
@Confectionary Running this 100 million times takes about 390 ms. Running the solution suggested by Jon Skeet 100 million times takes about 320 ms so it is not a huge difference.Bearded
@Bearded yes right, I'll delete my original comment. Just a little note that this may not be as fast as some other solutions here, but as you say shouldn't matter. The distribution is great, better than most solutions here, so +1 from me! :)Confectionary
How does this compare in quality (distribution) and performance to simply using a long intermediate, with multiplication of each input by a large prime? E.g. for two values, something like this one liner: return ((long)v1 * 805306457 + (long)v2 * 189783887).GetHashCode(); [The primes are chosen to avoid numeric overflow of long, in a checked environment, and to tend to set different bits.]Tarra
M
11

Here is another fluent implementation of the algorithm posted above by Jon Skeet, but which includes no allocations or boxing operations:

public static class Hash
{
    public const int Base = 17;

    public static int HashObject(this int hash, object obj)
    {
        unchecked { return hash * 23 + (obj == null ? 0 : obj.GetHashCode()); }
    }

    public static int HashValue<T>(this int hash, T value)
        where T : struct
    {
        unchecked { return hash * 23 + value.GetHashCode(); }
    }
}

Usage:

public class MyType<T>
{
    public string Name { get; set; }

    public string Description { get; set; }

    public int Value { get; set; }

    public IEnumerable<T> Children { get; set; }

    public override int GetHashCode()
    {
        return Hash.Base
            .HashObject(this.Name)
            .HashObject(this.Description)
            .HashValue(this.Value)
            .HashObject(this.Children);
    }
}

The compiler will ensure HashValue is not called with a class due to the generic type constraint. But there is no compiler support for HashObject since adding a generic argument also adds a boxing operation.

Masquer answered 20/1, 2014 at 23:41 Comment(0)
A
8

Here is my simplistic approach. I am using the classic builder pattern for this. It is typesafe (no boxing/unboxing) and also compatbile with .NET 2.0 (no extension methods etc.).

It is used like this:

public override int GetHashCode()
{
    HashBuilder b = new HashBuilder();
    b.AddItems(this.member1, this.member2, this.member3);
    return b.Result;
} 

And here is the acutal builder class:

internal class HashBuilder
{
    private const int Prime1 = 17;
    private const int Prime2 = 23;
    private int result = Prime1;

    public HashBuilder()
    {
    }

    public HashBuilder(int startHash)
    {
        this.result = startHash;
    }

    public int Result
    {
        get
        {
            return this.result;
        }
    }

    public void AddItem<T>(T item)
    {
        unchecked
        {
            this.result = this.result * Prime2 + item.GetHashCode();
        }
    }

    public void AddItems<T1, T2>(T1 item1, T2 item2)
    {
        this.AddItem(item1);
        this.AddItem(item2);
    }

    public void AddItems<T1, T2, T3>(T1 item1, T2 item2, T3 item3)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
    }

    public void AddItems<T1, T2, T3, T4>(T1 item1, T2 item2, T3 item3, 
        T4 item4)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
    }

    public void AddItems<T1, T2, T3, T4, T5>(T1 item1, T2 item2, T3 item3, 
        T4 item4, T5 item5)
    {
        this.AddItem(item1);
        this.AddItem(item2);
        this.AddItem(item3);
        this.AddItem(item4);
        this.AddItem(item5);
    }        

    public void AddItems<T>(params T[] items)
    {
        foreach (T item in items)
        {
            this.AddItem(item);
        }
    }
}
Anthropomorphosis answered 22/3, 2011 at 12:15 Comment(3)
you can avoid the object creation inside gethashcode function as in Mangus's answer. Just call the damn static hash functions (who cares about starter hash). Also, you could use AddItems<T>(params T[] items) method more often in the helper class (than calling AddItem(T) each time).Confectionary
And what benefit do you find doing this.result * Prime2 * item.GetHashCode() when often used is this.result * Prime2 + item.GetHashCode()?Confectionary
I can't use AddItems<T>(params T[] items) more often because typeof(T1) != typeof(T2) etc.Anthropomorphosis
T
7

If we have no more than 8 properties (hopefully), here is another alternative.

ValueTuple is a struct and appears to have a solid GetHashCode implementation.

That means we could simply do this:

// Yay, no allocations and no custom implementations!
public override int GetHashCode() => (this.PropA, this.PropB).GetHashCode();

Let's take a look at .NET Core's current implementation for ValueTuple's GetHashCode.

This is from ValueTuple:

    internal static int CombineHashCodes(int h1, int h2)
    {
        return HashHelpers.Combine(HashHelpers.Combine(HashHelpers.RandomSeed, h1), h2);
    }

    internal static int CombineHashCodes(int h1, int h2, int h3)
    {
        return HashHelpers.Combine(CombineHashCodes(h1, h2), h3);
    }

And this is from HashHelper:

    public static readonly int RandomSeed = Guid.NewGuid().GetHashCode();

    public static int Combine(int h1, int h2)
    {
        unchecked
        {
            // RyuJIT optimizes this to use the ROL instruction
            // Related GitHub pull request: dotnet/coreclr#1830
            uint rol5 = ((uint)h1 << 5) | ((uint)h1 >> 27);
            return ((int)rol5 + h1) ^ h2;
        }
    }

In English:

  • Left rotate (circular shift) h1 by 5 positions.
  • Add the result and h1 together.
  • XOR the result with h2.
  • Start by performing the above operation on { static random seed, h1 }.
  • For each further item, perform the operation on the previous result and the next item (e.g. h2).

It would be nice to know more about the properties of this ROL-5 hash code algorithm.

Regrettably, deferring to ValueTuple for our own GetHashCode may not be as fast as we would like and expect. This comment in a related discussion illustrates that directly calling HashHelpers.Combine is more performant. On the flip side, that one is internal, so we'd have to copy the code, sacrificing much of what we had gained here. Also, we'd be responsible for remembering to first Combine with the random seed. I don't know what the consequences are if we skip that step.

Teetotaler answered 15/5, 2018 at 12:0 Comment(1)
Assuming h1 >> 27 is 0 to ignore it, h1 << 5 equals h1 * 32 therefore it is same as h1 * 33 ^ h2. According to this page, it is called "Modified Bernstein".Johanajohanan
S
6

ReSharper users can generate GetHashCode, Equals, and others with ReSharper -> Edit -> Generate Code -> Equality Members.

// ReSharper's GetHashCode looks like this
public override int GetHashCode() {
    unchecked {
        int hashCode = Id;
        hashCode = (hashCode * 397) ^ IntMember;
        hashCode = (hashCode * 397) ^ OtherIntMember;
        hashCode = (hashCode * 397) ^ (RefMember != null ? RefMember.GetHashCode() : 0);
        // ...
        return hashCode;
    }
}
Stadiometer answered 1/9, 2016 at 19:19 Comment(0)
A
4

Most of my work is done with database connectivity which means that my classes all have a unique identifier from the database. I always use the ID from the database to generate the hashcode.

// Unique ID from database
private int _id;

...    
{
  return _id.GetHashCode();
}
Addition answered 5/11, 2008 at 5:3 Comment(6)
That means that if you have objects Person and Account and they both have and ID = 1, they will have the same hash code. And that is not ok.Bookrack
Actually the comment above is incorrect. There will always be the possibility of hash-code collisions (a hash code only locates the bucket, not the individual object). So such an implementation - for a hashcode containing mixed objects - would lead to a lot of collisions, which is undesirable, but it would be absolutely fine if you only ever had objects of a single type in your hashtables. Also it doesn't distribute evenly, however neither does the base implementation on system.object, so I wouldn't worry about it too much...Garrik
The hash code can just be the id, since the id is an integer. There's no need to call GetHashCode on an integer (it's an identity function)Norm
@DarrelLee but tomo his _id could be a Guid. It's a good coding practice to do _id.GetHashCode as the intent is clear.Confectionary
@DarrelLee, it is a not a good option because sequential IDs from the database don't provide a good distributionHighlight
@1224 depending on use patterns it can be horrible for the reason you give, but it can also be great; if you've a sequence of such numbers with no holes, then you've a perfect hash, better than any algorithm can produce. If you know that's the case you can even count on it and skip the equality check.Nourish
F
3

Pretty much similar to nightcoder's solution except it's easier to raise primes if you want to.

PS: This is one of those times where you puke a little in your mouth, knowing that this could be refactored into one method with 9 default's but it would be slower, so you just close your eyes and try to forget about it.

/// <summary>
/// Try not to look at the source code. It works. Just rely on it.
/// </summary>
public static class HashHelper
{
    private const int PrimeOne = 17;
    private const int PrimeTwo = 23;

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9, T10>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9, T10 arg10)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();
            hash = hash * PrimeTwo + arg10.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8, T9>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8, T9 arg9)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();
            hash = hash * PrimeTwo + arg9.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7, T8>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7, T8 arg8)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();
            hash = hash * PrimeTwo + arg8.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6, T7>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6, T7 arg7)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();
            hash = hash * PrimeTwo + arg7.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5, T6>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();
            hash = hash * PrimeTwo + arg6.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4, T5>(T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();
            hash = hash * PrimeTwo + arg5.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();
            hash = hash * PrimeTwo + arg4.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2, T3>(T1 arg1, T2 arg2, T3 arg3)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();
            hash = hash * PrimeTwo + arg3.GetHashCode();

            return hash;
        }
    }

    public static int GetHashCode<T1, T2>(T1 arg1, T2 arg2)
    {
        unchecked
        {
            int hash = PrimeOne;
            hash = hash * PrimeTwo + arg1.GetHashCode();
            hash = hash * PrimeTwo + arg2.GetHashCode();

            return hash;
        }
    }
}
Formant answered 21/10, 2014 at 17:49 Comment(1)
Doesn't handle nulls.Precast
A
2

Microsoft lead for several way of hashing...

//for classes that contain a single int value
return this.value;

//for classes that contain multiple int value
return x ^ y;

//for classes that contain single number bigger than int    
return ((int)value ^ (int)(value >> 32)); 

//for classes that contain class instance fields which inherit from object
return obj1.GetHashCode();

//for classes that contain multiple class instance fields which inherit from object
return obj1.GetHashCode() ^ obj2.GetHashCode() ^ obj3.GetHashCode(); 

I can guess that for multiple big int you can use this:

int a=((int)value1 ^ (int)(value1 >> 32));
int b=((int)value2 ^ (int)(value2 >> 32));
int c=((int)value3 ^ (int)(value3 >> 32));
return a ^ b ^ c;

And same for multi-type: all converted first to int using GetHashCode() then the int values will be xor'ed and the result is your hash.

For those who use hash as ID (I mean an unique value), hash is naturally limited to a number of digits, I think it was 5 bytes for hashing algorithm, at least MD5.

You may turn multiple values to a hashed value and some of them be same, so don't use it as an identifier. (maybe some day I am going to use your component)

Aceto answered 30/11, 2012 at 19:35 Comment(7)
Xoring integers to make a hashcode is a well-known antipattern that tends to result in a particularly high number of collisions with real-world values.Nourish
Every one here use integer, and there been never any kind of guarantee for hash to be same, it just tried to be as much vary as there are few collisions to happen.Aceto
Yes, but your second and fifth don't try to avoid collisions.Nourish
Not sure which thread it was... but it did the same, msdn.microsoft.com/en-us/library/…Aceto
Yes, that antipattern is quite common.Nourish
that's why i relay on that, but thanks for lighten things up... the other thing is does the other patter has less calculation time? you know, kind of, collision vs calculation time matter is also thereAceto
There's a balance to reach. Use a really good hash code like Spookyhash and you'll get much, much better collision avoidance but it will have much more calculation time than any of these (but when it comes to hashing very large amounts of data, Spookyhash is extremely quick). A simple shift on one of the values before xoring is only marginal extra cost for a good reduction in collision. Prime-number multiplication increasing both time and quality again. Which is better between shift or mult is hence debatable. Plain xor though very often has a lot of collisions on real data and is best avoidedNourish
B
1

This is a static helper class that implements Josh Bloch's implementation; and provides explicit overloads to "prevent" boxing, and also to implement the hash specifically for the long primitives.

You can pass a string comparison that matches your equals implementation.

Because the Hash output is always an int, you can just chain Hash calls.

using System;
using System.Collections;
using System.Collections.Generic;
using System.Reflection;
using System.Runtime.CompilerServices;


namespace Sc.Util.System
{
    /// <summary>
    /// Static methods that allow easy implementation of hashCode. Example usage:
    /// <code>
    /// public override int GetHashCode()
    ///     => HashCodeHelper.Seed
    ///         .Hash(primitiveField)
    ///         .Hsh(objectField)
    ///         .Hash(iEnumerableField);
    /// </code>
    /// </summary>
    public static class HashCodeHelper
    {
        /// <summary>
        /// An initial value for a hashCode, to which is added contributions from fields.
        /// Using a non-zero value decreases collisions of hashCode values.
        /// </summary>
        public const int Seed = 23;

        private const int oddPrimeNumber = 37;


        /// <summary>
        /// Rotates the seed against a prime number.
        /// </summary>
        /// <param name="aSeed">The hash's first term.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static int rotateFirstTerm(int aSeed)
        {
            unchecked {
                return HashCodeHelper.oddPrimeNumber * aSeed;
            }
        }


        /// <summary>
        /// Contributes a boolean to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aBoolean">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, bool aBoolean)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + (aBoolean
                                ? 1
                                : 0);
            }
        }

        /// <summary>
        /// Contributes a char to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aChar">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, char aChar)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + aChar;
            }
        }

        /// <summary>
        /// Contributes an int to the developing HashCode seed.
        /// Note that byte and short are handled by this method, through implicit conversion.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aInt">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, int aInt)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + aInt;
            }
        }

        /// <summary>
        /// Contributes a long to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aLong">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, long aLong)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + (int)(aLong ^ (aLong >> 32));
            }
        }

        /// <summary>
        /// Contributes a float to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aFloat">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, float aFloat)
        {
            unchecked {
                return HashCodeHelper.rotateFirstTerm(aSeed)
                        + Convert.ToInt32(aFloat);
            }
        }

        /// <summary>
        /// Contributes a double to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aDouble">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, double aDouble)
            => aSeed.Hash(Convert.ToInt64(aDouble));

        /// <summary>
        /// Contributes a string to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aString">The value to contribute.</param>
        /// <param name="stringComparison">Optional comparison that creates the hash.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(
                this int aSeed,
                string aString,
                StringComparison stringComparison = StringComparison.Ordinal)
        {
            if (aString == null)
                return aSeed.Hash(0);
            switch (stringComparison) {
                case StringComparison.CurrentCulture :
                    return StringComparer.CurrentCulture.GetHashCode(aString);
                case StringComparison.CurrentCultureIgnoreCase :
                    return StringComparer.CurrentCultureIgnoreCase.GetHashCode(aString);
                case StringComparison.InvariantCulture :
                    return StringComparer.InvariantCulture.GetHashCode(aString);
                case StringComparison.InvariantCultureIgnoreCase :
                    return StringComparer.InvariantCultureIgnoreCase.GetHashCode(aString);
                case StringComparison.OrdinalIgnoreCase :
                    return StringComparer.OrdinalIgnoreCase.GetHashCode(aString);
                default :
                    return StringComparer.Ordinal.GetHashCode(aString);
            }
        }

        /// <summary>
        /// Contributes a possibly-null array to the developing HashCode seed.
        /// Each element may be a primitive, a reference, or a possibly-null array.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aArray">CAN be null.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, IEnumerable aArray)
        {
            if (aArray == null)
                return aSeed.Hash(0);
            int countPlusOne = 1; // So it differs from null
            foreach (object item in aArray) {
                ++countPlusOne;
                if (item is IEnumerable arrayItem) {
                    if (!object.ReferenceEquals(aArray, arrayItem))
                        aSeed = aSeed.Hash(arrayItem); // recursive call!
                } else
                    aSeed = aSeed.Hash(item);
            }
            return aSeed.Hash(countPlusOne);
        }

        /// <summary>
        /// Contributes a possibly-null array to the developing HashCode seed.
        /// You must provide the hash function for each element.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aArray">CAN be null.</param>
        /// <param name="hashElement">Required: yields the hash for each element
        /// in <paramref name="aArray"/>.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash<T>(this int aSeed, IEnumerable<T> aArray, Func<T, int> hashElement)
        {
            if (aArray == null)
                return aSeed.Hash(0);
            int countPlusOne = 1; // So it differs from null
            foreach (T item in aArray) {
                ++countPlusOne;
                aSeed = aSeed.Hash(hashElement(item));
            }
            return aSeed.Hash(countPlusOne);
        }

        /// <summary>
        /// Contributes a possibly-null object to the developing HashCode seed.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="aObject">CAN be null.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int Hash(this int aSeed, object aObject)
        {
            switch (aObject) {
                case null :
                    return aSeed.Hash(0);
                case bool b :
                    return aSeed.Hash(b);
                case char c :
                    return aSeed.Hash(c);
                case int i :
                    return aSeed.Hash(i);
                case long l :
                    return aSeed.Hash(l);
                case float f :
                    return aSeed.Hash(f);
                case double d :
                    return aSeed.Hash(d);
                case string s :
                    return aSeed.Hash(s);
                case IEnumerable iEnumerable :
                    return aSeed.Hash(iEnumerable);
            }
            return aSeed.Hash(aObject.GetHashCode());
        }


        /// <summary>
        /// This utility method uses reflection to iterate all specified properties that are readable
        /// on the given object, excluding any property names given in the params arguments, and
        /// generates a hashcode.
        /// </summary>
        /// <param name="aSeed">The developing hash code, or the seed: if you have no seed, use
        /// the <see cref="Seed"/>.</param>
        /// <param name="aObject">CAN be null.</param>
        /// <param name="propertySelector"><see cref="BindingFlags"/> to select the properties to hash.</param>
        /// <param name="ignorePropertyNames">Optional.</param>
        /// <returns>A hash from the properties contributed to <c>aSeed</c>.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int HashAllProperties(
                this int aSeed,
                object aObject,
                BindingFlags propertySelector
                        = BindingFlags.Instance
                        | BindingFlags.Public
                        | BindingFlags.GetProperty,
                params string[] ignorePropertyNames)
        {
            if (aObject == null)
                return aSeed.Hash(0);
            if ((ignorePropertyNames != null)
                    && (ignorePropertyNames.Length != 0)) {
                foreach (PropertyInfo propertyInfo in aObject.GetType()
                        .GetProperties(propertySelector)) {
                    if (!propertyInfo.CanRead
                            || (Array.IndexOf(ignorePropertyNames, propertyInfo.Name) >= 0))
                        continue;
                    aSeed = aSeed.Hash(propertyInfo.GetValue(aObject));
                }
            } else {
                foreach (PropertyInfo propertyInfo in aObject.GetType()
                        .GetProperties(propertySelector)) {
                    if (propertyInfo.CanRead)
                        aSeed = aSeed.Hash(propertyInfo.GetValue(aObject));
                }
            }
            return aSeed;
        }


        /// <summary>
        /// NOTICE: this method is provided to contribute a <see cref="KeyValuePair{TKey,TValue}"/> to
        /// the developing HashCode seed; by hashing the key and the value independently. HOWEVER,
        /// this method has a different name since it will not be automatically invoked by
        /// <see cref="Hash(int,object)"/>, <see cref="Hash(int,IEnumerable)"/>,
        /// or <see cref="HashAllProperties"/> --- you MUST NOT mix this method with those unless
        /// you are sure that no KeyValuePair instances will be passed to those methods; or otherwise
        /// the generated hash code will not be consistent. This method itself ALSO will not invoke
        /// this method on the Key or Value here if that itself is a KeyValuePair.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="keyValuePair">The value to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int HashKeyAndValue<TKey, TValue>(this int aSeed, KeyValuePair<TKey, TValue> keyValuePair)
            => aSeed.Hash(keyValuePair.Key)
                    .Hash(keyValuePair.Value);

        /// <summary>
        /// NOTICE: this method is provided to contribute a collection of <see cref="KeyValuePair{TKey,TValue}"/>
        /// to the developing HashCode seed; by hashing the key and the value independently. HOWEVER,
        /// this method has a different name since it will not be automatically invoked by
        /// <see cref="Hash(int,object)"/>, <see cref="Hash(int,IEnumerable)"/>,
        /// or <see cref="HashAllProperties"/> --- you MUST NOT mix this method with those unless
        /// you are sure that no KeyValuePair instances will be passed to those methods; or otherwise
        /// the generated hash code will not be consistent. This method itself ALSO will not invoke
        /// this method on a Key or Value here if that itself is a KeyValuePair or an Enumerable of
        /// KeyValuePair.
        /// </summary>
        /// <param name="aSeed">The developing HashCode value or seed.</param>
        /// <param name="keyValuePairs">The values to contribute.</param>
        /// <returns>The new hash code.</returns>
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static int HashKeysAndValues<TKey, TValue>(
                this int aSeed,
                IEnumerable<KeyValuePair<TKey, TValue>> keyValuePairs)
        {
            if (keyValuePairs == null)
                return aSeed.Hash(null);
            foreach (KeyValuePair<TKey, TValue> keyValuePair in keyValuePairs) {
                aSeed = aSeed.HashKeyAndValue(keyValuePair);
            }
            return aSeed;
        }
    }
}
Buddhism answered 28/4, 2019 at 5:10 Comment(1)
Yipes: I found a bug! The HashKeysAndValues method has been fixed: it invokes HashKeyAndValue.Buddhism
H
0

I ran into an issue with floats and decimals using the implementation selected as the answer above.

This test fails (floats; hash is the same even though I switched 2 values to be negative):

        var obj1 = new { A = 100m, B = 100m, C = 100m, D = 100m};
        var obj2 = new { A = 100m, B = 100m, C = -100m, D = -100m};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

But this test passes (with ints):

        var obj1 = new { A = 100m, B = 100m, C = 100, D = 100};
        var obj2 = new { A = 100m, B = 100m, C = -100, D = -100};
        var hash1 = ComputeHash(obj1.A, obj1.B, obj1.C, obj1.D);
        var hash2 = ComputeHash(obj2.A, obj2.B, obj2.C, obj2.D);
        Assert.IsFalse(hash1 == hash2, string.Format("Hashcode values should be different   hash1:{0}  hash2:{1}",hash1,hash2));

I changed my implementation to not use GetHashCode for the primitive types and it seems to work better

    private static int InternalComputeHash(params object[] obj)
    {
        unchecked
        {
            var result = (int)SEED_VALUE_PRIME;
            for (uint i = 0; i < obj.Length; i++)
            {
                var currval = result;
                var nextval = DetermineNextValue(obj[i]);
                result = (result * MULTIPLIER_VALUE_PRIME) + nextval;

            }
            return result;
        }
    }



    private static int DetermineNextValue(object value)
    {
        unchecked
        {

                int hashCode;
                if (value is short
                    || value is int
                    || value is byte
                    || value is sbyte
                    || value is uint
                    || value is ushort
                    || value is ulong
                    || value is long
                    || value is float
                    || value is double
                    || value is decimal)
                {
                    return Convert.ToInt32(value);
                }
                else
                {
                    return value != null ? value.GetHashCode() : 0;
                }
        }
    }
Hb answered 28/9, 2014 at 16:44 Comment(1)
In case you intended otherwise unchecked does NOT affect Convert.ToInt32: uint, long, float, double and decimal can all overflow here.Durman
E
0

In case you want to polyfill HashCode from netstandard2.1

public static class HashCode
{
    public static int Combine(params object[] instances)
    {
        int hash = 17;

        foreach (var i in instances)
        {
            hash = unchecked((hash * 31) + (i?.GetHashCode() ?? 0));
        }

        return hash;
    }
}

Note: If used with struct, it will allocate memory due to boxing

Empirin answered 20/4, 2020 at 4:54 Comment(0)
U
0

Can try to adopt approach from C++ Boost libraries. Something like this:

class HashUtil
{
  public static int HashCombine(int seed, int other)
  {
    unchecked
    {
      return other + 0x9e3779b9 + (seed << 6) + (seed >> 2);
    }
  }
}

and then:

class MyClass
{
  private string _field1;
  private int _field2;
  private AnotherClass _field3;
  private YetAnotherClass _field4;

  public override int GetHashCode()
  {
    int result = HashUtil.HashCombine(_field1.GetHashCode(), _field2);
    result = HashUtil.HashCombine(result, _field3.GetHashCode());
    return HashUtil.HashCombine(result, _field4.GetHashCode());
  }
}
Upturn answered 25/1, 2021 at 19:40 Comment(0)
B
-1

I want to add my newest findings to this thread I came back to so often.

My current visual studio / project setup provides the functionallity to automatically refactors tuples to structs. This will generate a GetHashCode function like so:

        public override int GetHashCode()
        {
            int hashCode = -2088324004;
            hashCode = hashCode * -1521134295 + AuftragGesperrt.GetHashCode();
            hashCode = hashCode * -1521134295 + Auftrag_gesperrt_von.GetHashCode();
            hashCode = hashCode * -1521134295 + Auftrag_gesperrt_am.GetHashCode();
            return hashCode;
        }

EDIT: to clarify AuftragGesperrt, Auftrag_gesperrt_von and Auftrag_gesperrt_am are properties. If the microsoft devs use this function its probably not too bad of a solution.

Buettner answered 18/2, 2021 at 14:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.