Compute a hash from a stream of unknown length in C#
Asked Answered
P

6

37

What is the best solution in C# for computing an "on the fly" md5 like hash of a stream of unknown length? Specifically, I want to compute a hash from data received over the network. I know I am done receiving data when the sender terminates the connection, so I don't know the length in advance.

[EDIT] - Right now I am using md5 and am doing a second pass over the data after it's been saved and written to disk. I'd rather hash it in place as it comes in from the network.

Pigeon answered 1/9, 2010 at 19:5 Comment(0)
K
74

MD5, like other hash functions, does not require two passes.

To start:

HashAlgorithm hasher = ..;
hasher.Initialize();

As each block of data arrives:

byte[] buffer = ..;
int bytesReceived = ..;
hasher.TransformBlock(buffer, 0, bytesReceived, null, 0);

To finish and retrieve the hash:

hasher.TransformFinalBlock(new byte[0], 0, 0);
byte[] hash = hasher.Hash;

This pattern works for any type derived from HashAlgorithm, including MD5CryptoServiceProvider and SHA1Managed.

HashAlgorithm also defines a method ComputeHash which takes a Stream object; however, this method will block the thread until the stream is consumed. Using the TransformBlock approach allows an "asynchronous hash" that is computed as data arrives without using up a thread.

Kermis answered 1/9, 2010 at 19:9 Comment(1)
It doesn't suit for case when you receive stream fron network and just send (copy) stream with API CopyTo method to file system. CryptoStream solves the problem.Rehnberg
H
19

Further to @peter-mourfield 's answer, here is the code that uses ComputeHash():

private static string CalculateMd5(string filePathName) {
   using (var stream = File.OpenRead(filePathName))
   using (var md5 = MD5.Create()) {
      var hash = md5.ComputeHash(stream);
      var base64String = Convert.ToBase64String(hash);
      return base64String;
   }
}

Since both the stream as well as MD5 implement IDisposible, you need to use using(...){...}. In particular pay attention that md5.ComputeHash(stream) will move the stream.Position, so you may need to reset it if re-using the stream.

The method in the code example returns the same string that is used for the MD5 checksum in Azure Blob Storage.

Hypoderm answered 27/4, 2017 at 2:27 Comment(0)
L
16

This seems like a perfect use case for CryptoStream (docs).

I've used CryptoStream for processing unknown-length streams of database results that need to be gzipped and then transferred across the network along with a hash of the compressed file. Inserting a CryptoStream between the compressor and the file writer allows you to compute the hash on the fly so that it's ready as soon as the file is written.

The basic approach looks like this:

var hasher = MD5.Create();
using (FileStream outFile = File.Create(filePath))
using (CryptoStream crypto = new CryptoStream(outFile, hasher, CryptoStreamMode.Write))
using (GZipStream compress = new GZipStream(crypto, CompressionMode.Compress))
using (StreamWriter writer = new StreamWriter(compress))
{
    foreach (string line in GetLines())
        writer.WriteLine(line);
}
// at this point the streams are closed so the hash is ready
string hash = BitConverter.ToString(hasher.Hash).Replace("-", "").ToLowerInvariant();
Lannielanning answered 16/2, 2019 at 6:22 Comment(6)
Thanks for sharing, helpful!Pigeon
Thanks a lot, that does suit for "on fly" hashing... whithout hand-crafted stream chunking to blocksRehnberg
Thanks for the comment on closing the streams before reading the hash.Kazak
Looks like stream methods were added in newer .NET builds to MD5Pigeon
@Pigeon they don't provide the "on-the-fly" functionality thoBuhl
A stream by definition is on the flyPigeon
M
12

The System.Security.Cryptography.MD5 class contains a ComputeHash method that takes either a byte[] or Stream. Check out the documentation.

Misguidance answered 1/9, 2010 at 19:9 Comment(0)
D
4

Necromancing.

Two possibilitites in C# .NET Core:

private static System.Security.Cryptography.HashAlgorithm GetHashAlgorithm(System.Security.Cryptography.HashAlgorithmName hashAlgorithmName)
{
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.MD5)
        return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.MD5.Create();
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA1)
        return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.SHA1.Create();
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA256)
        return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.SHA256.Create();
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA384)
        return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.SHA384.Create();
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA512)
        return (System.Security.Cryptography.HashAlgorithm) System.Security.Cryptography.SHA512.Create();

    throw new System.Security.Cryptography.CryptographicException($"Unknown hash algorithm \"{hashAlgorithmName.Name}\".");
}


protected override byte[] HashData(System.IO.Stream data,
    System.Security.Cryptography.HashAlgorithmName hashAlgorithm)
{
    using (System.Security.Cryptography.HashAlgorithm hashAlgorithm1 = 
    GetHashAlgorithm(hashAlgorithm))
    return hashAlgorithm1.ComputeHash(data);
}

or with BouncyCastle:

private static Org.BouncyCastle.Crypto.IDigest GetBouncyAlgorithm(
    System.Security.Cryptography.HashAlgorithmName hashAlgorithmName)
{
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.MD5)
        return new Org.BouncyCastle.Crypto.Digests.MD5Digest();
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA1)
        return new Org.BouncyCastle.Crypto.Digests.Sha1Digest();
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA256)
        return new Org.BouncyCastle.Crypto.Digests.Sha256Digest();
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA384)
        return new Org.BouncyCastle.Crypto.Digests.Sha384Digest();
    if (hashAlgorithmName == System.Security.Cryptography.HashAlgorithmName.SHA512)
        return new Org.BouncyCastle.Crypto.Digests.Sha512Digest();

    throw new System.Security.Cryptography.CryptographicException(
        $"Unknown hash algorithm \"{hashAlgorithmName.Name}\"."
    );
} // End Function GetBouncyAlgorithm  



protected override byte[] HashData(System.IO.Stream data,
    System.Security.Cryptography.HashAlgorithmName hashAlgorithm)
{
    Org.BouncyCastle.Crypto.IDigest digest = GetBouncyAlgorithm(hashAlgorithm);

    byte[] buffer = new byte[4096];
    int cbSize;
    while ((cbSize = data.Read(buffer, 0, buffer.Length)) > 0)
        digest.BlockUpdate(buffer, 0, cbSize);

    byte[] hash = new byte[digest.GetDigestSize()];
    digest.DoFinal(hash, 0);
    return hash;
}
Demaggio answered 25/1, 2018 at 20:7 Comment(1)
Don't worry, there is no necromancy on SO. New, updated, answers are always welcome.Asch
C
1

Another option could be to use the System.Security.Cryptography.IncrementalHash class instead.

byte[] DataBrick;
var IncMD5 = IncrementalHash.CreateHash(HashAlgorithmName.MD5);

then you can: accumulate data in the hasher

IncMD5.AppendData(DataBrick,0,DataBrick.Length);

,check the hash value for the data accumulated so far

byte[] hash = IncMD5.GetCurrentHash();
bytesReceived = netStream.Read(DataBrick,0,DataBrick.Length);
IncMD5.AppendData(DataBrick,0,bytesReceived);

,or stop and reset to start accumulating a new hash value

byte[] hash = IncMD5.GetHashAndReset();

Note: it implements iDisposable

IncMD5.Dispose(); // when done, or using(IncMD5){..} if that makes more sense in your scope
Chivaree answered 23/8, 2022 at 17:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.