I have read some texts about strong collision resistance and weak collision resistance, but I was unable to understand the difference. The only thing I can understand that there is a low probability of collision in hash functions which have weak collision resistance, and a higher probability of collision in strong collision resistance hash functions. I could not understand what is the real thing, what is the significance of these parameters. Can anyone help me on this ?
The weak collision resistance property is sometimes also referred to as second preimage resistance:
Given an arbitrary x there exists no x' with x' != x so that h(x) = h(x')
Strong collision resistance on the other hand is defined as:
There exist no x and x' with x != x' so that h(x) = h(x')
The obvious difference in their definitions is that for weak collision resistance we assume to be bound to a particular choice of x, whereas in the definition of strong collision resistance we are free to arbitrarily choose our x and x'. Still this seems to be very similar, so let's look at two examples.
Weak collision resistance
A good example where we are actually only interested in weak collision resistance would be a simple password storage scheme. Assume we store user-provided passwords in a database by storing their hash. Authentication would succeed when the hash of some password a user provides is equal to the value that was stored previously (this is an inherently insecure scheme though, but please bear with me for the moment). Now in that case, the given x is the (unknown) original password that was provided earlier. If an attacker were capable of solving the "second preimage" problem efficiently, he could obtain an x' whose hash value is the same as that of the original x, and would thus be authenticated successfully. Please note that the capability to produce arbitrary collisions (i.e. solving the strong collision problem) is useless in general in this scenario because it is not too likely that the x and x' we get resemble actual passwords whose hashes have already been stored in the database.
Strong collision resistance
A different scenario where our concern is strong collision resistance instead is for example an application where you want to be able to look up arbitrary data stored in a database with the help of unique ids. Instead of issuing queries on the original data (which would often be very slow due to the potentially unbounded size of the data), you would compute hashes of the data instead. Hashes are very compact, limited in their size and can thus be queried much more efficiently. As a matter of fact, in these cases you often don't mind the (second) pre-image resistance property of a hash function at all, mostly because the preimages themselves are no secret. What you do care about, though, is that you would absolutely want to avoid two distinct data sets to hash to the same value, which is essentially a collision. You don't care about any collision in particular, but you want this property to hold universally - i.e. you don't want any two data sets hash to the same value (imagine there is a 'unique constraint' defined on that column). Because security is often no issue in these applications, we often use non-cryptographic hashes, mostly because they perform better.
The relationship between both
Intuitively and also implied by their names, we would assume that strong collision resistance is something that is harder to provide than weak collision resistance. Luckily for us, our intuition can be proven to be correct under the Random Oracle Model. We can prove this by contrapositive by assuming that if we had an efficient probabilistic polynomial algorithm for solving "second preimage", then this would also give us an efficient algorithm for solving "collision".
Consider a hash function h and this following simple probabilistic algorithm [1]:
Let 2ndPreimage be another probabilistic (e, Q)-algorithm that solves "second preimage" for the hash function h.
Choose x uniformly at random
value = 2ndPreimage(h, x)
case value == failure -> return failure
case value == x' (!= x) -> return (x, x')
It is easy to see that this is also an (e, Q)-algorithm that solves the strong collision problem. This implies that given we have an algorithm to solve "second preimage", we can also use this algorithm that is likely to produce a collision. As we made no assumptions on the underlying hash function h, we can now safely say that
Strong collision resistance implies weak collision resistance but the opposite doesn't necessarily hold.
[1] e is the success probability of the algorithm, 0 <= e < 1. Q is the maximum number of oracle queries (i.e. "evaluations" of the algorithm). In case of success, the algorithm returns a valid solution, otherwise it returns a value indicating failure.
Given an arbitrary x there exists no x' with x' != x so that h(x) = h(x')
Your statement that there exists no
is too strong. By the pigeonhole principle, we're guaranteed to have x' that hashes the same way as x, since both hashes aren't injective. It's not about such an x' existing, it's a matter of how difficult it is to find such an x'. Same carries for your second statement. –
Liquefacient Authentication would succeed when the hash of some password a user provides is equal to the value that was stored previously (this is an inherently insecure scheme though, but please bear with me for the moment).
I'm not sure I'm following, but isn't this pretty much how all password-based authentication systems work? So you're saying all password-based authentication systems are flawed? –
Etherealize I wrote this answer due to the comment of Alexander; All the definition are copied from; Cryptographic Hash-Function Basics: Definitions, Implications, and Separations for Preimage Resistance, Second-Preimage Resistance, and Collision Resistance by P. Rogaway and T. Shrimpton.
- preimage-resistance — for essentially all pre-specified outputs, it is computationally infeasible to find any input which hashes to that output, i.e., to find any preimage
x'
such thath(x') = y
when given any y for which a corresponding input is not known. - 2nd-preimage resistance, weak-collision — it is computationally infeasible to find any second input which has the same output as any specified input, i.e., given
x
, to find a 2nd-preimagex' != x
such thath(x) = h(x')
. - collision resistance, strong-collision — it is computationally infeasible to find any two distinct inputs
x
,x'
which hash to the same output, i.e., such thath(x) = h(x')
.
Fact 1: Collision resistance implies 2nd-preimage resistance of hash functions.
Fact 2: 2nd-preimage resistance implies preimage resistance.
As Alexander noted, by the pigeonhole principle, when the input space larger than the output space of the hash function the collisions are inevitable.
TLDR for anyone short on time:
- strong collision resistance => can't find any two values X, Y so that their hashes are the same
- weak collision resistance => given a specific X can't find any Y, so that their hashes are the same
Example with file hashes (SHA256):
- you find any two files with the same SHA256 hash => you have just broken strong collision resistance of SHA256
- someone gives you a specific file (but you don't know in advance which one) and you find another file which has the same SHA256 hash as the given file => you have just broken weak collision resistance of SHA256
Which is easier to break and which is worse when “broken”?
Breaking weak collision resistance is much more severe than breaking strong collision resistance.
For example, you are downloading a file from the internet and want to prove it’s not been tampered with. Assuming you know the SHA256 hash of the file, you can just download the file, hash it and compare the resulting hash to the one you know the file should have. If they match the file is safe, if not, it has been tampered with.
Now, if someone breaks strong collision resistance of SHA256 it would mean that they just found any 2 files which have the same hash, but the chances that any of these two files is the same as your wished file is astronomically small, so this is not very useful for an attacker.
But if someone breaks weak collision resistance it means that they could generate another file which has the exact same hash as the file which you want to download. That would allow an attacker to swap the file which you want to download with his tampered file and you would never notice as they both have the same hash.
Sound terrible, right? And it is! But not all is (immediately) lost, even if weak collision resistance is broken. Although an attacker could generate another file with the same hash, weak collision resistance doesn't say anything about the file being a “valid” file. E.g. if you are downloading a python script, the attacker might be able to generate another binary blob which has the same hash as your python script, but it’s not a valid python script, it would give an error when run by the python interpreter.
To go from “being able to generate a binary blob which has the same hash as our python script” to “being able to generate another valid python script which has the same hash as our python script” is another order of computational effort more. And even then it doesn’t say anything about it being a python script which can exfiltrate data or do some nefarious stuff. That is another order or computation more.
So, even if weak collision resistance is broken at some point in the future, it’s “not great, not terrible”, to use a famous movie quote. But this only applies for “the now”. It will become terrible if left be for a longer period of time (which gives the adversary more time to refine their collision finding algorithm and/or just more computational time to brute-force the right malicious input). Also, it hugely depends for what the broken hash function is used and how much validation of the input is performed by the end application. E.g. in our python script download example, we are only being saved by the fact that the python interpreter won’t accept random binary blobs, which isn’t a strong security guarantee…
Conclusion
To sum it, strong collision resistance provides stronger security guarantees when it holds, but because it provides such a strong claim, it is easier to break it. But even when broken, it's not terrible as we still have weak collision resistance to protect us! As such, even if strong collision resistance is broken, the hash function can still be usable in practice for quite some time because weak collision resistance is still, paradoxically, quite "strong". But if that one is also broken (or close to being broken), then it's time to move on and not wait for attackers to sharpen their collision detection algorithms while our data is only being protected by some flaky (or accidental) application validation.
Related blog post from Cloudflare on this topic: Why it’s harder to forge a SHA-1 certificate than it is to find a SHA-1 collision
Note: in the TLDR, X and Y have to be distinct values (i.e. can't be the same value) and "can't find" refers to not being able to find in a reasonable amount of time (i.e. you can't take 100 years to find a collision).
© 2022 - 2024 — McMap. All rights reserved.