MD5 Hash and Base64 encoding
Asked Answered
A

3

32

If I have a 32 character string (an MD5 hash) and I encode it using Base64, what's the maximun length of the encoded string?

Arundinaceous answered 25/11, 2010 at 14:35 Comment(6)
If you have a 32 character string that is an MD5 hash then it is already hex encoded and there is no need to base64 encode it.Sb
An MD5 hash is not hexadecimal! It's 16 bytes - hexadecimal is simply a conventional representation.Hermaphroditism
@GregS, you are correct that an MD5 hash are typically represented in hexadecimal form, which is a subset of Base64. But there is a purpose to converting to Base64 -- Base64 takes fewer characters because it has a larger character set. It will save you disk space when you are saving in plaintext or a character-encoding scheme if you use Base64 (22 characters) rather than hexadecimal (32 characters) notation.Elman
@GregS Actually, you do need to encode it if that is the format expected, e.g. in the HTTP Content-MD5 header.Slate
@Slate Actually, in that case you must first hex decode it and then base64 encode it. The base64 encoding of the 32 character hex string would be incorrect.Sb
@GregS yes, that's trueSlate
E
77

An MD5 value is always 22 (useful) characters long in Base64 notation. Many Base64 algorithms will also append 2 characters of padding when encoding an MD5 hash, bringing the total to 24 characters. The padding adds no useful information and can be discarded. Only the first 22 characters matter.

Here's why:

An MD5 hash is a 128-bit value. Every character in a Base64 string contains 6 bits of information, because there are 64 possible values for the character, and it takes 6 powers of 2 to reach 64. With 6 bits of information in every character, 21 characters has 126 bits of information, and 22 characters contains 132 bits of information. Since 128 bits cannot fit within 21 characters but does fit within 22 characters (with a little room to spare), a 128-bit value will always be represented as 22 characters in Base64.

A note on the padding:

I mentioned above that many Base64 encoding algorithms add a couple of characters of padding when encoding an MD5 value. This is because Base64 represents 3 bytes of information as 4 characters. Since MD5 has 16 bytes of information, many Base64 encoding algorithms append "==" to designate that the input of 16 bytes was 2 bytes short of the next multiple of 3, which would have been 18 bytes. These 2 equal signs add no information whatsoever to the string, and can be discarded when storing.

Elman answered 8/11, 2012 at 19:28 Comment(0)
E
12

As per http://en.wikipedia.org/wiki/Base64

"Note that given an input of n bytes, the output will be (n + 2 - ((n + 2) % 3)) / 3 * 4 bytes long, which converges to n * 4 / 3 or 1.33333n for large n."

So, it will be ((32 + 2 - (32 + 2) % 3)) / 3 * 4 = 34 - (34 % 3) / 3 * 4 = (34 - 1) / 3 * 4 = 33/3*4 = 44 characters.

You could always extract it in raw binary form (128 bits) and encode it directly into base 64, which means converting 16 bytes instead of 32, which becomes 24 bytes when base 64 encoded.

Evadnee answered 25/11, 2010 at 14:45 Comment(6)
An md5 hash is 128 bits, which would encode to 24 base64 characters.Sb
@ GregS, sorry, yes, I was thinking of SHA1 which is 160 bits.Evadnee
There is really no point in encoding a hex-encoded hash output in base64 - the valid characters in a hex sequence are a subset of those in a base64 sequence.Mcglynn
@caf, but there would be point in decoding the hex and then re-encoding it in Base64... the size of the encoded string would be smallerByte
@JoelFan, Example here: #27384261Ninebark
The easier way to explain the math is every 3 bytes in binary (24-bits) is encoded in 4 characters (6 bits each). So ROUND_UP(n / 3) * 4, which in most languages can be written as: (n + 2) / 3 * 4 or Math.ceil(n / 3) * 4. In any case, the answer is still 24 charactersExtent
H
5

MD5 128 bits is represented as 22 characters in Base64. also have 2 padding charater '=' in this case.

How?

$ md5sum ./README.md 
c6b5f48774aa0a87a82a276ff86be507  ./README.md
$ md5sum ./README.md | base64
YzZiNWY0ODc3NGFhMGE4N2E4MmEyNzZmZjg2YmU1MDcgIC4vUkVBRE1FLm1kCg==

In this case Base64 encoded string does not shorter than the MD5 hash length

Because what is encoded is the storage form of MD5 hash. not MD5 hash value itself.

Need to note how many bit is used to store one digit of MD5 hash.

Right way:

convert the hash value so 1 convert the hexadecimal to binary

2 convert the binary to base64 coded sting

$ cat ./README.md |  openssl dgst -md5 
c6b5f48774aa0a87a82a276ff86be507
$ cat ./README.md |  openssl dgst -md5 -binary | openssl enc -base64
xrX0h3SqCoeoKidv+GvlBw==

or

$ md5sum ./LICENSE 
e3fc50a88d0a364313df4b21ef20c29e  ./LICENSE
$ cat ./LICENSE |  openssl dgst -md5 -binary | openssl enc -base64
4/xQqI0KNkMT30sh7yDCng==
$ (echo 0:; echo e3fc50a88d0a364313df4b21ef20c29e) | xxd -rp -l 16|base64
4/xQqI0KNkMT30sh7yDCng==
Hued answered 31/3, 2021 at 23:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.