Why does HashMap need a cryptographically secure hashing function?
Asked Answered
C

3

16

I'm reading a Rust book about HashMap hashing functions, and I can't understand these two sentences.

By default, HashMap uses a cryptographically secure hashing function that can provide resistance to Denial of Service (DoS) attacks. This is not the fastest hashing algorithm available, but the trade-off for better security that comes with the drop in performance is worth it.

I know what a cryptographically secure hash function is, but don't I understand the rationale behind it. From my understanding a good hash function for HashMap should only have three properties:

  • deterministic (the same object has same hash value)
  • be VERY fast,
  • has a uniform distribution of bits in hash value (meaning it will reduce collision)

Other properties, in cryptographically secure hash function, are not really relevant 99% (maybe even 99.99%) of the time for hash tables.

So my question is: What does "resistance to DoS attack and better security " even mean in the context of HashMap?

Clyster answered 5/9, 2018 at 11:44 Comment(9)
You may enjoy reading bugs.php.net/bug.php?id=60623, arstechnica.com/information-technology/2011/12/… and nikic.github.io/2014/12/22/… or search for "php5 hashtable collision" :)Queen
It kinda boils down to opinions here: whether one assumes that is better to give "secure" to everybody (at the cost of performance for 95% not needing secure all the time) versus give "fast" to everybody (at the cost of missing the places where "secure" is required, not "fast").Conquest
@Conquest If 5% needs the security, it's still possible to provide a custom BuildHasher.Alkyne
@TimDiekmann See, how we get into a discussion here? There are multiple options to the underlying problem, and to a certain degree, we get into the business of having different opinions how to best solve things. Rendering the question a bit well, opinionated, in my eyes.Conquest
@Conquest This is why I already flagged as opinion based ;)Alkyne
@Conquest I see what you mean, but I don't think this quesiton is "primarily opinion based". The main question here is the bold thing at the end. So OP seems to want to understand why a cryptographically secure hash function is even important in a hash map. All three answers answer that question and stayed very neutral on the "discussion part".Zofiazoha
@LukasKalbertodt Sure. I would say that, too, when I decided to answer the question ;-) ... but seriously: grey area. Therefore I didn't downvote ... even going to upvote stuff now.Conquest
I kinda knew the answer in the first place, but I was hoping to find more use cases for when people might need a cryptographically secure hash function. I still think it is a wrong default, because usually you never write your own web framework (you probably will use the best known one), so chances are very slim, that you'll ever need this functionality by default (I probably never required this in my 20years programming experience). So I guess Rust values security <100x more than performance in this case.Clyster
Is there a simple way to get a more efficient hash function? I have a Project Euler solution where I would like to see how much faster a "normal" hash function is (i.e. a hash function comparable to what other programming languages do).Ambulatory
Y
22

Let's start backward: how do you DoS a HashMap?

Over the years, there have been multiple attacks on various software stacks based on Hash Flooding. If you know which framework a site is powered by, and therefore which hash function is used, and this hash function is not cryptographically secure then you may be able to pre-compute, offline, a large set of strings hashing to the same number.

Then, you simply inject this set into the site, and for each (simple) request, it does a disproportionately large amount of work as inserting N elements takes O(N2) operations.


Rust was conceived with the benefit of hindsight, and therefore attention was paid to avoiding this attack by default, reasoning that users who really need performance out of HashMap would simply switch the hash function.

Youngstown answered 5/9, 2018 at 12:2 Comment(1)
Probably worth also keeping in mind that the performance of the default hash algorithm is probably sufficient for most use cases.Plat
Z
6

Let's say we use HashMap to store some user data in a web-application. Suppose that users can choose (part of) the key in some way – maybe the key is a username or a filename of an uploaded file or anything like that.

If we are not using a cryptographically secure hash function, this means that the attacker could possible craft multiple inputs that all map to the same output. Of course, a hash map has to deal with collisions, because they occur naturally.

But when unnaturally many collisions occur, the hash map implementation might do strange things. For example, looking up some keys could have a runtime of O(n). Or the hash map might think that it has to grow because of all the collisions; but growing won't solve the problem, so the hash map grows until all memory is used. In either case, it's bad. Hash maps just assume that statistically, collisions rarely occur.

Of course, this is not a "stealing user data" attack -- at least not directly. But if one part of a system is weak, this makes it easier for attackers to find other weaknesses.

A cryptographically secure hash function prevents this attack, since the attacker cannot possibly craft multiple keys that map to the same value (at least not without trying out all keys).


is not really relevant 99% (maybe even 99.99%) of the time for hash tables.

Yes, probably. But this is difficult to balance. I guess we all would agree that if 20% of users would have security problems in their application due to an unsecure hash function (while 80% don't care), it's still a good idea to use the "secure by default" approach. What about 5%/95%? What about 1%/99%? Hard to tell where the threshold is, right?

There has been a ton of discussion about this already. Because yes, most people only notice the slowness of the hash map. Maybe the situation I described above is incredibly rare and it isn't worth slowing down all other users' code by default. But this has been decided, the default hash function won't change, and luckily you can choose your own hash function.

Zofiazoha answered 5/9, 2018 at 12:8 Comment(0)
G
2

If a server application stores user input (such as post data in a web application) in a hash table, a malicious user may try to provide a large number of inputs that all have the same hash value, leading to a large number of hash collisions and thus slowing down operations on the map significantly, to the point that it can be used as a DoS attack (as described in this article for example).

If the hash is cryptographically secure, attackers will have a much harder time trying to find inputs with the same hash value.

Gormandize answered 5/9, 2018 at 11:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.