512 bit hash vs 4 128bit hash
Asked Answered
A

4

5

Interestingly I haven't found enough information regarding any test or experiment of collision chances of single 512bit hash like whirlpool versus concatenation of 4 128bit hashes like md5, sha1 etc.

Possibility of 4 128bit hashes to appear same seems less probable than single 512bit hash when the data on which hashing is performed is considerably of small size merely on average 100 characters.

But its just an apparent guess with no basis because I haven't performed any test. What you think about it?

Edit its like 512bit hash vs 128bit hash . 128bit hash . 128bit hash . 128bit hash (4 128bit hash concatenated)

Edit2 I want to use hash for this index on url or hashing considering RAM and purpose is to minimize the possibility of collision because I want to set hash column as unique instead of url column.

Edit3 Please note that purpose of this question is to find the way to minimize the possibility of collision. Having said that, Why I need to focus more on minimizing the possibility of collision? Here comes my Edit2 description which leads to finding the solution to use less RAM. So, interests are both in minimizing the collision and lower RAM usage. But prime focus of this question is lowering the possibility of collision.

Armil answered 13/9, 2011 at 20:10 Comment(10)
What is the specific question here? ("What do you think about it?" is not a specific question.)Osvaldooswal
What precisely are you comparing?Osvaldooswal
[@Oil Charlesworth] Specific question is that what is probability of 512bit hash collision against 4 128bit concatenated hashes colliding at the same time.Armil
So you're comparing the collision behaviour of hash512(x) with the collision behaviour of hash128_a(x) . hash128_b(x) . hash128_c(x) . hash128_d(x)? (where . denotes "concatenation")Osvaldooswal
yes, right. :) I mentioned above in my question about concatenation of 4 128bit hashes.Armil
From your comment in a answer, you're worried about collisions in the sense of hastables, like those used in data structures, or collisions in the meaning of hash-securing some data, to know if someone has changed it?Dropsical
@Dropsical absolutely right. I have also added reference to another questions in which there are all the details of using hashes.Armil
@Rick : it would be easier if you have specified your problem in the beggining. I'll delete my answer here, since your real interest is not about hash collisions, but how to better represent some URL using less space. I'll answer about that in the correct place (original question).Dropsical
@Dropsical your post is helpful please don't delete it. And interest is in both using less space and avoiding collision. That's why I have different question focusing on different interests. But I referenced that question here to add perspective of the question. Here avoiding collision is the main interest.Armil
If all you want to do is minimize collisions, without an adversary, 128 bits is already ample. A collision here is so fabulously unlikely that using a longer hash is just a waste of space.Handicap
O
6

It sounds like you want to compare the collision behaviour of:

hash512(x)

with the collision behaviour of:

hash128_a(x) . hash128_b(x) . hash128_c(x) . hash128_d(x)

where "." denotes concatenation, and hash128_a, hash128_b, etc. are four different 128-bit hash algorithms.

The answer is: it depends entirely on the properties of the individual hashes involved.

Consider, for instance that the 128-bit hash functions could be implemented as:

uint128_t hash128_a(T x)   { return hash512(x)[  0:127]; }
uint128_t hash128_b(T x)   { return hash512(x)[128:255]; }
uint128_t hash128_c(T x)   { return hash512(x)[256:383]; }
uint128_t hash128_d(T x)   { return hash512(x)[384:511]; }

In which case, the performance would be identical.

Osvaldooswal answered 14/9, 2011 at 13:9 Comment(2)
Exactly, I am looking for collision behavior. But I am interested to know how its probable to appear all 4 128bit hashes to appear same for any value twice than a single 512bit to appear twice. I have searched a lot but haven't find any real information based on reasoning or experiment.Armil
@Rick: Like I said, it depends on the hash functions.Osvaldooswal
N
4

The classical article to read on that question is due to Hoch and Shamir. It builds on previous discoveries, especially by Joux. Bottom-line is the following: if you take four hash functions with a 128-bit output, and the four hash functions use the Merkle-Damgård construction, then finding a collision for the whole 512-bit ouput is no more difficult than finding a collision for one of the hash functions. MD5, SHA-1... use the MD construction.

On the other hand, if some of your hash functions use a distinct structure, in particular with a wider running state, the concatenation could yield a stronger function. See the example from @Oli: if all four functions are SHA-512 with some surgery on the output, then the concatenated hash function could be plain SHA-512.

The only sure thing about the concatenation of four hash functions is that the result will be no less collision-resistant than the strongest of the four hash functions. This has been used within SSL/TLS, which, up to version 1.1, internally uses concurrently both MD5 and SHA-1 in an attempt to resist breaks on either.

Nightie answered 14/9, 2011 at 16:56 Comment(3)
You pointed out a very strong reasoning that concatenated hash would be no less collision resistant than the strongest of the four hash functions. Could you please elaborate a little how 4 SHA-512 with some surgery on the output then condensed to single 512bit SHA-512 hash would be better than 4 128bit hashes?Armil
The four truncated SHA-512 from @Oli, when concatenated, really are SHA-512 (same results for same input). SHA-512 is believed to offer the best security you could hope for with a 512-bit output (i.e. resistance to collision up to 2^256 effort). Some other concatenations will not do as well; see the Hoch-Shamir article for details (there's some mathematics involved, but the question is really research-level).Nightie
This. It's easy to assume that using multiple distinct hashes would multiply your security, but it doesn't. It's a pretty non-obvious and important result.Handicap
B
3

512 bits is 512 bits. The only difference is in the difference in imperfections in the hashes. The best overall hash would be a 512 using the best algorithm available.

Edit to add clarification, because it's too long for a comment:

An ideal hash maps content uniformly onto x bits. If you have 4 (completely independent) x-bit hashes, that maps the file uniformly onto 4x bits; a 4x-bit hash still maps the same file uniformly onto 4x bits. 4x bits is 4x bits; as long as it's perfectly uniform, it doesn't matter whether it comes from one (4x) hash function or 4 (x). However, no hash can be completely ideal, so you want the most uniform obtainable distribution, and if you use 4 different functions, only 1 can be the closest to optimal so you have x optimal bits and 3x suboptimal, whereas a single algorithm can cover the entire 4x space with the most optimal distribution.

I suppose it is possible that enough larger algorithms could have subsets of bits that are more uniformly distributed than a single 512, and could be combined to get more uniformity, but that seems like it would be a great deal extra research and implementation for little potential benefit.

Belvabelvedere answered 13/9, 2011 at 20:22 Comment(2)
What is the reasoning behind this answer? And why in your point of view 4 128 concatenated hashes have more chance to collide at the same time? While you must take into account that data sample is not big..Armil
If the optimal 512 bits hash, all of a sudden, happens to have vulnerabillities that lead to collision in fewer steps, the whole hash gets into trouble. If you have 4 x 128 hashes, and one has vuln. in it, you'll end up with 3x 128 bits hash...Dropsical
S
2

If you are comparing concatenating four different 'ideal' 128bit hashing algorithms with one ideal 512 bit hashing algorithm, then yes, both methods will get you the same probability of a collision. Using md5 would make it easier to crack a hash though. If an attacker for example knew you were doing md5 + md5 w/ salt + md5 with another salt .. then that would be much easier to crack as md5 collision attack. Look here for more information about hash functions that have known attacks.

Secunda answered 13/9, 2011 at 20:24 Comment(3)
Well, its not prone to hackers because these are hashes to make MySQL index shorter and look up faster.Armil
@Rick Then why are you using a 512bit hash anyway? Without adversaries, 128 bits should be ample.Handicap
@Nick I updated the question to include the purpose of considering hashes.Armil

© 2022 - 2024 — McMap. All rights reserved.