Quick and Simple Hash Code Combinations

Asked 29/10, 2009 at 21:53 Answered 20/1, 2020 at 11:37

112

Can people recommend quick and simple ways to combine the hash codes of two objects. I am not too worried about collisions since I have a Hash Table which will handle that efficiently I just want something that generates a code quickly as possible.

Reading around SO and the web there seem to be a few main candidates:

XORing
XORing with Prime Multiplication
Simple numeric operations like multiplication/division (with overflow checking or wrapping around)
Building a String and then using the String classes Hash Code method

What would people recommend and why?

Hanson answered 29/10, 2009 at 21:53 Comment(0)

162

I would personally avoid XOR - it means that any two equal values will result in 0 - so hash(1, 1) == hash(2, 2) == hash(3, 3) etc. Also hash(5, 0) == hash(0, 5) etc which may come up occasionally. I have deliberately used it for set hashing - if you want to hash a sequence of items and you don't care about the ordering, it's nice.

I usually use:

unchecked
{
    int hash = 17;
    hash = hash * 31 + firstField.GetHashCode();
    hash = hash * 31 + secondField.GetHashCode();
    return hash;
}

That's the form that Josh Bloch suggests in Effective Java. Last time I answered a similar question I managed to find an article where this was discussed in detail - IIRC, no-one really knows why it works well, but it does. It's also easy to remember, easy to implement, and easy to extend to any number of fields.

Kimmi answered 29/10, 2009 at 22:11 Comment(24)

Yeah that was my concern about XORing, in the type of data I'm pairing it's fairly unlikely to be pairing too equal items but not impossible – Hanson 29/10, 2009 at 22:21

Looks like Dan Bernstein's (or Chris Torek's) hash, just with different constants. Nobody knows why that works well either. – Flunky 29/10, 2009 at 22:27

@RobV: I prefer not to have to think if I don't have to. I use this hash even when I could get away with XOR, just to avoid having to wonder whether it's safe or not :) – Kimmi 29/10, 2009 at 22:48

what are the firstField and secondField that you talk about? – Causal 28/12, 2009 at 21:23

@SuperString: the first and second fields you want to hash. You then keep going for the other fields. – Kimmi 28/12, 2009 at 21:46

A word of warning, this is (a variation of) the Berstein hash, and because nobody knows why it does well in tests it is not advisable when hashing is critical. See eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx. Also, you should wrap this code in an unchecked { } block. GetHashCode() should not throw any exceptions. – Dissimilate 31/12, 2009 at 16:38

This is so mysterious - why 17? why 31? is it important that they are primes? – Ear 21/9, 2012 at 5:1

@tofutim: The 31 is a good choice as multiplication by 31 can be optimized to a shift and subtract. Whether it is optimized that way depends on the platform. As for why those numbers work well for hashing - as Henk says, it's a bit of a mystery. – Kimmi 21/9, 2012 at 5:44

Why do you prime (no pun intended) the pump with 17 and 31? Why isn't it just return firstField.GetHashCode() * 31 + secondField.GetHashCode();? Or is that part of what no one knows? – Capitate 22/4, 2015 at 20:54

@PatrickM: I follow Josh Bloch (or some variant), basically. It does mean that it's harder to end up with a hash code of 0... – Kimmi 22/4, 2015 at 20:55

But is a hash code of 0 any worse than a hash code of 16337? (17 * 31 * 31, which is amusingly close to 3L337...) I suppose having more multiplications means you have bigger hash codes, on the whole. Is there a similar improvement in distribution? – Capitate 22/4, 2015 at 21:12

@PatrickM: I think hash codes of 0 are more likely to occur between different types, raising the chance of a collision. For example, two objects with different numbers of fields, but where all the fields give a hash code of 0, would have different hashes under this scheme. But this is a pretty tenuous argument, I'm willing to admit :) – Kimmi 22/4, 2015 at 21:13

The Bernstein hash (DJB, DJB2) uses 33, not 31. And 5381, not 17. But yeah, otherwise very similar. I've used DJB2 for both string hashing and hashcode combination and it's worked fine for me. DJB2 and other algorithms can be found at cse.yorku.ca/~oz/hash.html – Printery 17/5, 2016 at 17:2

@Jon can you please comment on Special Sauce's answer? I'd like to know what your thoughts are on it. – Devastating 26/1, 2018 at 14:54

@rory.ap: I think it's an excellent piece of work, and I'd be perfectly happy to use those numbers. While I hate to admit to using constants "because someone else said to" that's basically what the 17/31 pair is about. – Kimmi 26/1, 2018 at 15:2

"no body knows" feels like higher beings know it and we cant comprehend it! – Juvenescent 31/12, 2018 at 21:26

Starting with .NET Core 2.1 you can use the System.HashCode type's Combine method to do this learn.microsoft.com/en-us/dotnet/api/system.hashcode.combine – Bring 11/1, 2019 at 9:45

Hi, That's seems to be a great way to get a good distribution of values. The only thing that bothers me a lot: the process is order dependant. Which mean that you will get different Hash depending on the order you feed them. Thats is prone to error. You can add a comparison at start on both inner hash to solve the problem (to ensure to calculate the hash based on the smaller or bigger inner hash. But you have to keep in mind that is good only for a pair, not more. – Forsake 3/10, 2019 at 15:7

@EricOuellet: I would say that being order-dependent is a good thing. You don't want (1, 0) and (0, 1) to give the same hash code, for example, as that gives more hash collisions. – Kimmi 4/10, 2019 at 7:50

@JonSkeet, I parlty agree. I would say that for some case it is a good thing but it really depends on the situation. I think that it would be really important to bring a clear distinction on the 2 very specific aspects of the effect on the way to combine hash code. I think that it is important (distinction between order dependant and order independant) and many peoples seems to under estimate it. It is very case specific. – Forsake 4/10, 2019 at 14:59

@EricOuellet: It would only be a good thing for it to be order-independent if equality was also defined in an order-independent way, which is rare in my experience. – Kimmi 4/10, 2019 at 15:3

I'm still unable to find System.HashCode - I get a compiler error when I try to use it. – Jota 3/10, 2020 at 4:43

@mushi: Which platform are you targeting? – Kimmi 3/10, 2020 at 7:7

Android/Unity. I later realised github.com/GlitchEnzo/NuGetForUnity exists, so I now have what I need! Except I don't know if it compiles to Android or not. – Jota 4/10, 2020 at 12:8

103

If you are using .NET Core 2.1 or later or .NET Framework 4.6.1 or later, consider using the System.HashCode struct to help with producing composite hash codes. It has two modes of operation: Add and Combine.

An example using Combine, which is usually simpler and works for up to eight items:

public override int GetHashCode()
{
    return HashCode.Combine(object1, object2);
}

An example of using Add:

public override int GetHashCode()
{
    var hash = new HashCode();
    hash.Add(this.object1);
    hash.Add(this.object2);
    return hash.ToHashCode();
}

Pros:

Part of .NET itself, as of .NET Core 2.1/.NET Standard 2.1 (though, see con below)
- For .NET Framework 4.6.1 and later, the Microsoft.Bcl.HashCode NuGet package can be used to backport this type.
Looks to have good performance and mixing characteristics, based on the work the author and the reviewers did before merging this into the corefx repo
Handles nulls automatically
Overloads that take IEqualityComparer instances

Cons:

Not available on .NET Framework before .NET 4.6.1. HashCode is part of .NET Standard 2.1. As of September 2019, the .NET team has no plans to support .NET Standard 2.1 on the .NET Framework, as .NET Core/.NET 5 is the future of .NET.
General purpose, so it won't handle super-specific cases as well as hand-crafted code

Cooe answered 6/8, 2018 at 22:35 Comment(9)

You can reference nuget.org/packages/Microsoft.Bcl.HashCode to make it working on .NET Framework 4.6.1 or .NET Standard 2.0, officially. – Prefiguration 31/7, 2020 at 2:45

System.HashCode uses xxHash32 (github.com/Cyan4973/xxHash) – Fat 22/9, 2021 at 15:54

Does Bcl.HashCode generate stable hashes for strings? – Snowstorm 8/6, 2022 at 9:20

@RedRidingHood, HashCode is not the right type to use for a hash code that gets persisted, which is what I assume you mean by "stable". HashCode values are only stable within the same instance of a process. This has been the documented behavior of .NET GetHashCode and related values for a very, very long time. Though, .NET Core started using a per-process seed to enforce this. If you have more questions, I suggest you post it as a stand-alone question. – Cooe 8/6, 2022 at 21:50

Thanks, I was aware of the Net Core behavior, I just wasn't sure if the Bcl package was different then what Net Core is using. – Snowstorm 9/6, 2022 at 15:44

So i've used this before fine for 2 fields, 3 fields... What about 10? 30? – Inexpressible 14/8, 2023 at 1:25

@James: follow the example using Add for lots of fields. You "accumulate" values in each call to Add and produce a final hashcode by calling ToHashCode. – Cooe 15/8, 2023 at 0:31

The question is more about effectiveness, and if this simple method breaks down or could be improved when combining many hashes – Inexpressible 20/8, 2023 at 2:38

@James, the current implementation use xxHash for mixing the input hashes together. xxHash has good mixing characteristics. I doubt you'll notice anything untoward as the number of hashes you combine increases. This is in contrast to something like a naive XOR hash which will "lose" an input value if the same value is added later. I also doubt that any future implementations would regress this. – Cooe 22/8, 2023 at 2:36

While the template outlined in Jon Skeet's answer works well in general as a hash function family, the choice of the constants is important and the seed of 17 and factor of 31 as noted in the answer do not work well at all for common use cases. In most use cases, the hashed values are much closer to zero than int.MaxValue, and the number of items being jointly hashed are a few dozen or less.

For hashing an integer tuple {x, y} where -1000 <= x <= 1000 and -1000 <= y <= 1000, it has an abysmal collision rate of almost 98.5%. For example, {1, 0} -> {0, 31}, {1, 1} -> {0, 32}, etc. If we expand the coverage to also include n-tuples where 3 <= n <= 25, it does less terrible with a collision rate of about 38%. But we can do much better.

public static int CustomHash(int seed, int factor, params int[] vals)
{
    int hash = seed;
    foreach (int i in vals)
    {
        hash = (hash * factor) + i;
    }
    return hash;
}

I wrote a Monte Carlo sampling search loop that tested the method above with various values for seed and factor over various random n-tuples of random integers i. Allowed ranges were 2 <= n <= 25 (where n was random but biased toward the lower end of the range) and -1000 <= i <= 1000. At least 12 million unique collision tests were performed for each seed and factor pair.

After about 7 hours running, the best pair found (where the seed and factor were both limited to 4 digits or less) was: seed = 1009, factor = 9176, with a collision rate of 0.1131%. In the 5- and 6-digit areas, even better options exist. But I selected the top 4-digit performer for brevity, and it peforms quite well in all common int and char hashing scenarios. It also seems to work fine with integers of much greater magnitudes.

It is worth noting that "being prime" did not seem to be a general prerequisite for good performance as a seed and/or factor although it likely helps. 1009 noted above is in fact prime, but 9176 is not. I explicitly tested variations on this where I changed factor to various primes near 9176 (while leaving seed = 1009) and they all performed worse than the above solution.

Lastly, I also compared against the generic ReSharper recommendation function family of hash = (hash * factor) ^ i; and the original CustomHash() as noted above seriously outperforms it. The ReSharper XOR style seems to have collision rates in the 20-30% range for common use case assumptions and should not be used in my opinion.

Sacci answered 30/11, 2015 at 19:25 Comment(8)

Wow. I love the work that went into this answer. Impressive, well done! – Love 15/2, 2017 at 20:57

Seems to be the best but there is 2 remarks: First and easy one, why not moving "seed" and "factor" to the end and give them a default value (1009 and 9176) where it shuold do the job for most peoples. Second point: like Jon Skeet algo, that is order dependant and you can get a different hash if you feed in a different order. I wonder if it would not be safer to sort that array first in order to ensure to have the same final hash at the end either if you feed the algo in different way. That would become safer. – Forsake 3/10, 2019 at 15:13

@EricOuellet Because the params int[] vals must come at the end of all function arguments, I was not able to make the seed and factor defaulted parameters. If you don't care about the params syntax convenience, you could remove it and then reorder the parameters to allow the defaults as you suggested. – Sacci 9/10, 2019 at 21:42

@EricOuellet The default hashing for an array should be to consider permutations (this is the more general case) and thus the hash would be different for different orderings (just as the hash for string "abc" is different than the hash for "acb"). If you specifically wanted a hash function for combinations only, you should probably accept a HashSet<int> argument (assuming no duplicates). Otherwise you could rename the function to CustomHashCombination() to prevent confusion, and do the internal pre-sort as suggested. – Sacci 9/10, 2019 at 21:44

I like this answer, but I wouldn't use params because it has to allocate an array on every call. So it may be faster computations-wise, but it creates GC pressure for later. – Noddle 20/1, 2020 at 9:13

What were the 5 and 6-digit numbers that got generated? Will 1009 and 9176 work well for hashing in a table of 2^28 (int.MaxValue / 16) values? – Jota 10/12, 2020 at 7:56

Not sure if I can trust these constants considering this test has a hard limit [-1000..1000] range for i. Wouldn't it be better, if possible, to run it allowing all int values, but using random ints from a bell-curve distribution centered in 0? You could also add ints coming from GetHashCode of floats and doubles in a bell-curve distribution centered in 0.0. It'd be great if you published your testing algorithm in a gist or something so people can understand it better (and maybe tweak it for specific cases). – Dickey 21/12, 2022 at 5:21

@Dickey Larger int values are not really a problem because they tend to mix even faster (over the unchecked int.MaxValue modulo). You could make seed and factor very large random constants, and get great mixing / low collisions immediately with almost any seed & factor values you tried. The whole point of my exercise was to find a seed & factor combination that worked well with a low number of digits to make a nice readable, convenient snippet. – Sacci 30/12, 2022 at 21:13

Use the combination logic in tuple. The example is using c#7 tuples.

(field1, field2).GetHashCode();

Interlineate answered 24/11, 2017 at 22:38 Comment(6)

Nice idea though I suspect that this might have issues with GC churn since you are implicitly creating a short lived object – Hanson 27/11, 2017 at 10:50

@Hanson Tuples are value types, so they are stack allocated and exert no GC pressure. – Shagbark 13/2, 2018 at 10:34

One problem... (0,1,2).GetHashCode() and (0,0,1,2).GetHashCode() both yield the same value: 35. Whereas the method in the most upvoted answer yields unique values 0, 1, 2: 506480 and 0, 0, 1, 2: 15699890 – Improvvisatore 17/6, 2018 at 22:23

Hashcodes aren't guaranteed to be unique. You found a case where it isn't... It doesn't make it a bad choice unless there are a lot of collisions(in that case it would be a good idea to submit a bug). I personally prefer to use something from the framework instead of implementing something different. – Interlineate 24/6, 2018 at 15:8

It's actually ValueTuple type of a struct (MSDN). Be careful that Tuple type is a class and it has GC pressure. I like this way. Internally it's similar to @Stipo 's post but very easy to understand and to review. It would be good choice in most cases. – Drawstring 16/8, 2018 at 10:5

Best answer, works on both .net 4.5 and on core. Use the System.ValueTuple nuget package on .net 4.5, and on core it's built in. – Pahari 30/4, 2021 at 14:56

I presume that .NET Framework team did a decent job in testing their System.String.GetHashCode() implementation, so I would use it:

// System.String.GetHashCode(): http://referencesource.microsoft.com/#mscorlib/system/string.cs,0a17bbac4851d0d4
// System.Web.Util.StringUtil.GetStringHashCode(System.String): http://referencesource.microsoft.com/#System.Web/Util/StringUtil.cs,c97063570b4e791a
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
    int hash1 = (5381 << 16) + 5381;
    int hash2 = hash1;

    int i = 0;
    foreach (var hashCode in hashCodes)
    {
        if (i % 2 == 0)
            hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ hashCode;
        else
            hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ hashCode;

        ++i;
    }

    return hash1 + (hash2 * 1566083941);
}

Another implementation is from System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32) and System.Array.CombineHashCodes(System.Int32, System.Int32) methods. This one is simpler, but probably doesn't have such a good distribution as the method above:

// System.Web.Util.HashCodeCombiner.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#System.Web/Util/HashCodeCombiner.cs,21fb74ad8bb43f6b
// System.Array.CombineHashCodes(System.Int32, System.Int32): http://referencesource.microsoft.com/#mscorlib/system/array.cs,87d117c8cc772cca
public static int CombineHashCodes(IEnumerable<int> hashCodes)
{
    int hash = 5381;

    foreach (var hashCode in hashCodes)
        hash = ((hash << 5) + hash) ^ hashCode;

    return hash;
}

Thunderstorm answered 11/12, 2015 at 17:55 Comment(0)

This is a repackaging of Special Sauce's brilliantly researched solution.
It makes use of Value Tuples (ITuple).
This allows defaults for the parameters seed and factor.

public static int CombineHashes(this ITuple tupled, int seed=1009, int factor=9176)
{
    var hash = seed;

    for (var i = 0; i < tupled.Length; i++)
    {
        unchecked
        {
            hash = hash * factor + tupled[i].GetHashCode();
        }
    }

    return hash;
}

Usage:

var hash1 = ("Foo", "Bar", 42).CombineHashes();    
var hash2 = ("Jon", "Skeet", "Constants").CombineHashes(seed=17, factor=31);

Madgemadhouse answered 20/1, 2020 at 11:37 Comment(0)

If you're looking for speed and don't have too many collisions, then XOR is fastest. To prevent a clustering around zero, you could do something like this:

finalHash = hash1 ^ hash2;
return finalHash != 0 ? finalHash : hash1;

Of course, some prototyping ought to give you an idea of performance and clustering.

Phineas answered 30/10, 2009 at 0:34 Comment(0)

-1

If your input hashes are the same size, evenly distributed and not related to each other then an XOR should be OK. Plus it's fast.

The situation I'm suggesting this for is where you want to do

H = hash(A) ^ hash(B); // A and B are different types, so there's no way A == B.

of course, if A and B can be expected to hash to the same value with a reasonable (non-negligible) probability, then you should not use XOR in this way.

Dermato answered 29/10, 2009 at 21:57 Comment(1)

how would I tell whether my hash codes are evenly distributed, is there an easy benchmark to do for this? I know the collision rate is pretty low but does that necessarily correspond to an even distribution? – Hanson 29/10, 2009 at 22:4

-4

Assuming you have a relevant toString() function (where your different fields shall appear), I would just return its hashcode:

this.toString().hashCode();

This is not very fast, but it should avoid collisions quite well.

Decastere answered 29/12, 2018 at 12:57 Comment(0)

-13

I would recommend using the built-in hash functions in System.Security.Cryptography rather than rolling your own.

Kussell answered 29/10, 2009 at 23:5 Comment(1)

No, They have a very different purpose and break the rule that GetHashCode should be fast. – Dissimilate 30/10, 2009 at 10:20

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags