What are the differences between MD5 binary mode and text mode?
Asked Answered
S

2

22

Here's my testing :

...$ md5sum -b roy.html 
f9283ca2833ff7ebb6781ab8d23a21aa *roy.html
...$ md5sum -t roy.html 
f9283ca2833ff7ebb6781ab8d23a21aa  roy.html

Is there any different between these two mode ?

Slumlord answered 1/8, 2013 at 7:50 Comment(2)
unix.stackexchange.com/a/127961Gigue
NOTICE for all readers about the use of checksums on digital preservation: the binary mode is the standard.Colcothar
A
14

‘-b’ ‘--binary’

  • Treat each input file as binary, by reading it in binary mode and
    outputting a ‘*’ flag. This is the inverse of --text. On systems like GNU that do not distinguish between binary and text files, this
    option merely flags each input mode as binary: the MD5 checksum is
    unaffected. This option is the default on systems like MS-DOS that
    distinguish between binary and text files, except for reading
    standard input when standard input is a terminal.

‘-t’ ‘--text’

  • Treat each input file as text, by reading it in text mode and outputting a ‘ ’ flag. This is the inverse of --binary. This option is the default on systems like GNU that do not distinguish between binary and text files. On other systems, it is the default for reading standard input when standard input is a terminal. This mode is never defaulted to if --tag is used.
Affirm answered 1/8, 2013 at 7:53 Comment(2)
Could you give an example of a file where the MD5SUM would be different?Latent
@Latent I don't know of any MD5 sum tools that would do this, but in some programming languages, various newlines can be automatically converted to the platform default upon being read unless a "binary read mode" is specified. For example, \r\n (Windows newline) could be converted to \n on Linux, or \n could be converted to \r\n on Windows.Stadiometer
C
1

I am finding some interesting differences between binary mode and non-binary mode.

My use case is that I am trying to create 256-bit AES keys for use on AWS S3 block storage service. These keys are used to support server side encryption (SSE). I have spent hours (almost days) trying to figure out why my code was unable to interact with S3, never having suspected my keys as the problem. Actually, generating the key was not the problem. I was able to generate the binary key and the base64 encoded version of the binary key quite easily.

What the problem was was quite surprising to me. I am no stranger to md5, I have used it for decades without fail. But it turns out that the md5 sum/hash I was generating based on the binary key was wrong. My first indication was that it was a few characters longer than what I was seeing in a working example that I was looking at. I had been unable to create an md5 sum that was as short as the example, and I had no idea why there would be a difference.

I found that:

OSX (bsd) md5 has no concept of binary input mode. OSX (bsd) md5sum has a flag for binary input mode, but it does not change the actual outputted hash, it only changes the metadata related to that hash.

Alpine Linux md5 does have a concept of binary input mode. Alpine Linux md5sum has no concept of binary input mode.

Debian Linux md5 seems to not exist Debian Linux md5sum has a flag for binary input mode, but it does not change the actual outputted hash, it only changes the metadata related to that hash.

For example, I get these outputs when running:

OSX:

openssl rand 32 > key
cat key | md5
936e87c3f08e54d036c7a38dc9dbd540
cat key | md5sum
936e87c3f08e54d036c7a38dc9dbd540  -
cat key | md5sum -b
936e87c3f08e54d036c7a38dc9dbd540 *-

Alpine Linux:

openssl rand 32 > key
cat key | md5
915b2c6c3368c19f96e9a79089389c15
cat key | md5 -b
kVssbDNowZ+W6aeQiTicFQ==
cat key | md5sum
915b2c6c3368c19f96e9a79089389c15  -

Debian Linux:

openssl rand 32 > key
cat key | md5sum
a44f9c1d1f7a35f2374ad2987296b54b  -
cat key | md5sum -b
a44f9c1d1f7a35f2374ad2987296b54b *-

I am finding that (at least) what AWS S3 is expecting is the md5 of a binary key that is output like what Alpine Linux is doing in the case of:

cat key | md5 -b
kVssbDNowZ+W6aeQiTicFQ==

I will try to reach out to Sören Tempel of Alpine Linux to try to find out what is going on with these differences.

Comedown answered 12/4, 2020 at 23:52 Comment(4)
That -b flag for md5 isn't binary input mode--it's base64 output mode. It just changes the output format; it has no effect on the input or how the hash is generated. You're just opting to base64-encode the hash instead of base16 (hexadecimal).Baun
Thanks, this is the most useful answer! Please, see also the comment of @Stadiometer about files with multiple lines (DOS \r\n could be converted to \n on Linux or inverse on text mode?).Colcothar
For all readers about the use of checksums on digital preservation: the binary mode is the standard.Colcothar
Linux does no conversions in C's text mode, and ignores the binary flag when passed to fopen. Only DOS/Windows does conversions, from its own \r\n to \n when reading and vice versa when writing.Runway

© 2022 - 2024 — McMap. All rights reserved.