why is hash output fixed in length?

Asked 13/4, 2015 at 6:28 Answered 13/4, 2015 at 13:3

Hash functions always produce a fixed length output regardless of the input (i.e. MD5 >> 128 bits, SHA-256 >> 256 bits), but why?

I know that it is how the designer designed them to be, but why they designed the output to have the same length? So that it can be stored in a consistent fashion? easier to be compared? less complicated?

Kowal answered 13/4, 2015 at 6:28 Comment(4)

a hash is a condensed (lossy) version of the original data. There would be little point hashing data smaller than the hash size. It it were smaller, then you could probably recover it.... – Ned 13/4, 2015 at 6:33

even hashing of the bigger data produces the same size, no? My question is why did the designer design it to be as such though... – Kowal 13/4, 2015 at 6:59

A varying size would presumably give some clues to original composition(?) – Ned 13/4, 2015 at 7:28

That sound possible too, @MitchWheat :D It is also because of the memory issue as described by j_random_hacker I think :D – Kowal 14/4, 2015 at 6:56

Because that is what the definition of a hash is. Refer to wikipedia

A hash function is any function that can be used to map digital data of arbitrary size to digital data of fixed size.

If your question relates to why it is useful for a hash to be a fixed size there are multiple reasons (non-exhaustive list):

Hashes typically encode a larger (often arbitrary size) input into a smaller size, generally in a lossy way, i.e. unlike compression functions, you cannot reconstruct the input from the hash value by "reversing" the process.
Having a fixed size output is convenient, especially for hashes designed to be used as a lookup key.
You can predictably (pre)allocate storage for hash values and index them in a contiguous memory segment such as an array.
For hashes of "native word sizes", e.g. 16, 32 and 64 bit integer values, you can do very fast equality and ordering comparisons.
Any algorithm working with hash values can use a single set of fixed size operations for generating and handling them.
You can predictably combine hashes produced with different hash functions in e.g. a bloom filter.
You don't need to waste any space to encode how big the hash value is.

There do exist special hash functions, that are capable of producing an output hash of a specified fixed length, such as so-called sponge functions.

Onym answered 13/4, 2015 at 6:43 Comment(0)

As you can see it is the standard.

Also what you want is specified in standard :

Some application may require a hash function with a message digest length different than those provided by the hash functions in this Standard. In such cases, a truncated message digest may be used, whereby a hash function with a larger message digest length is applied to the data to be hashed, and the resulting message digest is truncated by selecting an appropriate number of the leftmost bits.

Noelnoelani answered 13/4, 2015 at 6:43 Comment(0)

Often it's because you want to use the hash value, or some part of it, to quickly store and look up values in a fixed-size array. (This is how a non-resizable hashtable works, for example.)

And why use a fixed-size array instead of some other, growable data structure (like a linked list or binary tree)? Because accessing them tends to be both theoretically and practically fast: provided that the hash function is good and the fraction of occupied table entries isn't too high, you get O(1) lookups (vs. O(log n) lookups for tree-based data structures or O(n) for lists) on average. And these accesses are fast in practice: after calculating the hash, which usually takes linear time in the size of the key with a low hidden constant, there's often just a bit shift, a bit mask and one or two indirect memory accesses into a contiguous block of memory that (a) makes good use of cache and (b) pipelines well on modern CPUs because few pointer indirections are needed.

Bujumbura answered 13/4, 2015 at 13:3 Comment(0)

Recommended topics

Hot tags