Why do string hash codes change for each execution in .NET?

Asked 11/10, 2022 at 5:29 Answered 11/10, 2022 at 22:57

Consider the following code:

Console.WriteLine("Hello, World!".GetHashCode());

First run:

139068974

Second run:

-263623806

Now consider the same thing written in Kotlin:

println("Hello, World!".hashCode())

First run:

1498789909

Second run:

1498789909

Why do hash codes for string change for every execution in .NET, but not on other runtimes like the JVM?

Kensell answered 11/10, 2022 at 5:29 Comment(10)

What version of .NET are you using? A debug or release build? Debugger attached? – Nuthatch 11/10, 2022 at 5:31

Docs: "The hash code itself is not guaranteed to be stable. Hash codes for identical strings can differ across .NET implementations, across .NET versions, and across .NET platforms (such as 32-bit and 64-bit) for a single version of .NET. In some cases, they can even differ by application domain. This implies that two subsequent runs of the same program may return different hash codes." (I know that doesn't answer your question as to WHY though) – Pennate 11/10, 2022 at 5:33

Cannot reproduce with .NET framework 4.8. I get the same hash code (243930825) in every run (also the same in Debug and Release) – Phylloxera 11/10, 2022 at 5:33

I can repro in .NET 6. I found the reason: String.GetHashCode uses a random seed value. – Nuthatch 11/10, 2022 at 5:34

@Phylloxera You're probably using .NET Core (see: andrewlock.net/…) – Pennate 11/10, 2022 at 5:34

@Pennate I actually use .NET framework 4.8 (updated my comment). – Phylloxera 11/10, 2022 at 5:35

I meant not using. My bad. – Pennate 11/10, 2022 at 5:36

I've not been able to find a definitive reason for WHY it's like this. I've only found the speculation in the article I linked previously. – Pennate 11/10, 2022 at 5:50

On a side note. I wonder what the use-case for an unstable hash function is. Isn't that effectively just a random generator at that point? – Whelm 11/10, 2022 at 6:7

@Whelm - it's stable within the process whilst that process is running, which is all they ever wanted to guarantee. – Nougat 11/10, 2022 at 6:15

Why do hash codes for string change for every execution in .NET

In short to prevent hash collision attacks. You can roughly find out the reason from the docs of the <UseRandomizedStringHashAlgorithm> configuration element:

The string lookup in a hash table is typically an O(1) operation. However, when a large number of collisions occur, the lookup can become an O(n²) operation. You can use the configuration element to generate a random hashing algorithm per application domain, which in turn limits the number of potential collisions, particularly when the keys from which the hash codes are calculated are based on data input by users.

but not on other runtimes like the JVM?

Not exactly, for example Python's hash function is random. C# also produces identity hash in .net framework, core 1.0 and core 2.0 when <UseRandomizedStringHashAlgorithm> is not enabled.

For Java maybe it's a historical issue because the arithmetic is public, and it's not good, read this.

Calamondin answered 11/10, 2022 at 7:12 Comment(10)

The question is about string hash codes across program executions. This answer quotes documentation about string hash codes across application domains. Hence my downvote. – Evangelize 11/10, 2022 at 23:46

Each program execution would create one or more AppDomain, so the two terms can be used interchangeably in this case. – Clemmer 12/10, 2022 at 3:7

@TheodorZoulias I need to explain first. The main reason I quote this document is to explain the harm caused by hash collision. My understanding is that the two clauses of the whole question should be parallel. So I think what OP wants to ask is not why the hash code changes, but why it changes in C#, but not in Java. And although the docs says "per application domain", I've tested (in .net framework 4.7.1) it changes per execution when this flag is set. So I think it means "per application domain per execution". – Calamondin 12/10, 2022 at 5:13

The OP hasn't specified the target .NET platform, but it's a safe bet that their observations were on the currently evolving platform (.NET Core and .NET 5+), and not on the stagnate .NET Framework. The .NET Core does not support application domains, so I don't think that the quoted text from the docs is directly relevant to the question asked. It might be insightful, but that's it. – Evangelize 12/10, 2022 at 6:1

@shingo: An appdomain is an isolation region within a process. So of course if you start two processes, you are dealing with two distinct groups of appdomains. – Ditchwater 12/10, 2022 at 19:48

@TheodorZoulias: It might be more correct to say that Core doesn't support multiple appdomains. You get just the one, there are no appdomain boundaries to manage, so no management functions. But anything documented as "per-appdomain" still applies in .NET Core, with one appdomain per process. – Ditchwater 12/10, 2022 at 19:49

@BenVoigt what is the effect of the setting <UseRandomizedStringHashAlgorithm enabled=0|1 /> in a .NET Core application? Does it make any difference if you configure it with 0 or 1? My point is that, assuming that it makes no difference, the quoted text does not apply to the .NET platform that the OP has experimented with. So this answer does not explain directly the results of these experiments. – Evangelize 12/10, 2022 at 20:8

@BenVoigt I didn't catch Jeremy's comment, but now I understand, thanks for your explanation. – Calamondin 13/10, 2022 at 3:59

@TheodorZoulias you don't have to fixate on the effect of this configuration. I found this docs because I remember that randomized string hash was already implemented in .Net Framework, and now it becomes the default behaviour. – Calamondin 13/10, 2022 at 4:14

Shingo your latest edit improved the answer enough to revoke my downvote. I think that it could be improved even further by explaining where is this quoted text coming from, where it applies, and how it relates with the currently evolving .NET platform. Adding this context is needed IMHO, because the text as is can (1) mislead people into believing that an option to prevent the randomization exists (it doesn't), and (2) confuse people who are not familiar with the concept of application domains in .NET. – Evangelize 13/10, 2022 at 6:27

Why do hash codes change for every execution in .NET?

Because changing the hash code of strings (and other objects!) on each run is a very strong hint to developers that hash codes do not have any meaning outside of the process that generated the hash.

Specifically, the documentation says:

Furthermore, .NET does not guarantee the default implementation of the GetHashCode method, and the value this method returns may differ between .NET implementations, such as different versions of .NET Framework and .NET Core, and platforms, such as 32-bit and 64-bit platforms. For these reasons, do not use the default implementation of this method as a unique object identifier for hashing purposes. Two consequences follow from this:

You should not assume that equal hash codes imply object equality.

You should never persist or use a hash code outside the application domain in which it was created, because the same object may hash across application domains, processes, and platforms.

By changing the hash code of a given object from one run to the next, the runtime is telling the developer not to use the hash code for anything that crosses a process/app-domain boundary. That will help to insulate developers from bugs stemming from changes to the GetHashCode algorithms used by standard classes.

Having hash codes change from one run to the next also discourages things like persisting the hash code for use as a "did this thing change" short-cut. This both prevents bugs from changes to the underlying algorithms and bugs from assuming that two objects of the same type with the same hash code are equal, when no such guarantee is made (in fact, no such guarantee can be made for any data structure which requires or allows more than 32 bits, due to the pigeonhole principle).

Why do other languages generate stable hash codes?

Without a thorough language-by-language review, I can only speculate, but the major reasons are likely to be some combination of:

historical inertia (read: "backwards compatibility")
the disadvantages of stable hash codes were insufficiently understood when the language spec was defined
adding instability to hash codes was too computationally expensive when the language spec was defined
hash codes were less visible to developers

Goldcrest answered 11/10, 2022 at 22:57 Comment(5)

I don't know about .NET, but Perl and Python both generate random hash codes as a defense against denial-of-service attacks. (If the hash codes are static, then providing carefully-selected strings for storage changes the complexity of a hash table from O(1) to O(n) or worse.) – Lipchitz 12/10, 2022 at 1:28

@Mark: it seems that that would rely on the app using hash codes generated out-of-process (eg., using hash codes to see if a data store has the same version of an object that the app does). Which, .NET's documentation explicitly says shouldn't be done, so it seems that defending against DoS attacks is a side-benefit or special case of "don't let hash codes out of the app domain". – Goldcrest 12/10, 2022 at 1:50

It relies on the attacker being able to know in advance what hash codes will be generated inside the process, not that the program accept hash code generated externally. – Nougat 12/10, 2022 at 6:49

Fair point, but that still relies on the attacker knowing other internal details of the code (eg., that a hash table is in use), and that the strings be hashed as-is (eg., not appended with the attacker's username). However: since the docs focus on incorrect assumptions about hash codes and equality, I think it's safe to focus on that aspect. – Goldcrest 12/10, 2022 at 13:39

If stable hash codes are required, e.g. to store a passwords, use the classes derived from the System.Security.Cryptography.HashAlgorithm Class. – Marelda 18/10, 2022 at 11:15

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Why do hash codes change for every execution in .NET?

Why do other languages generate stable hash codes?

Recommended topics

Hot tags