Why do seemingly empty files and strings produce md5sums?
Asked Answered
D

3

65

Consider the following:

% md5sum /dev/null
d41d8cd98f00b204e9800998ecf8427e  /dev/null
% touch empty; md5sum empty
d41d8cd98f00b204e9800998ecf8427e  empty
% echo '' | md5sum
68b329da9893e34099c7d8ad5cb9c940  -
% perl -e 'print chr(0)' | md5sum
93b885adfe0da089cdf634904fd59f71  -
% md5sum ''
md5sum: : No such file or directory

First of all, I'm surprised by the output of all these commands. If anything, I would expect the sum to be the same for all of them.

Dawndawna answered 6/6, 2012 at 7:31 Comment(1)
You can use od -tax1 to see that your examples 3 and 4 are not in fact empty files. Example: echo '' | od -tax1Unmeet
A
120

The md5sum of "nothing" (a zero-length stream of characters) is d41d8cd98f00b204e9800998ecf8427e, which you're seeing in your first two examples.

The third and fourth examples are processing a single character. In the "echo" case, it's a newline, i.e.

$ echo -ne '\n' | md5sum
68b329da9893e34099c7d8ad5cb9c940 -

In the perl example, it's a single byte with value 0x00, i.e.

$ echo -ne '\x00' | md5sum
93b885adfe0da089cdf634904fd59f71 -

You can reproduce the empty checksum using "echo" as follows:

$ echo -n '' | md5sum
d41d8cd98f00b204e9800998ecf8427e -

...and using Perl as follows:

$ perl -e 'print ""' | md5sum
d41d8cd98f00b204e9800998ecf8427e  -

In all four cases, you should expect the same output from checksumming the same data, but different data should produce a wildly different checksum (that's the whole point -- even if it's only a single character that differs.)

Anikaanil answered 6/6, 2012 at 7:39 Comment(2)
...or perl -e ''Mufi
Or md5sum < /dev/nullDisarticulate
D
26

Why do seemingly empty files and strings produce md5sums?

Because the "sum" in the md5sum is somewhat misleading. It's not like e.g. CRC32 checksum, that is zero for the empty file.

MD5 is one of message digest algorithms. You can imagine it as a box that produces fixed-length random-looking value (hash) depending on its internal state. You change the internal state by feeding in the data.

And that box internal state is predefined, such that that it yields randomly looking hash value even before any data is fed in. For MD5, it happens to be d41d8cd98f00b204e9800998ecf8427e.

Demosthenes answered 15/12, 2014 at 21:51 Comment(1)
To be a little more exact: MD5 will internally add a padding block to the end of the message. Thus, the hash value is the result of crunching the hash function on this padding block, not precisely the initial state.Yardage
E
3

No need for surprise. The first two produce true empty inputs to md5sum. The echo produces a newline (echo -n '' should produce an empty output; I don't have a linux machine here to check). The perl produces a single zero byte (not to be confused with C where a zero byte marks end of string). The last command is looking for a file with the empty string as its file name.

Eventful answered 6/6, 2012 at 7:46 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.