c# fast hash calculation
Asked Answered
S

4

6

I'm looking for a c# wrapper to a native MD5 or SHA1 library to improve hash calculation performance.

Previously I switched SharpZipLib to zlib and got more than 2x performance boost. (ok, you've to take care you've the right zlib.so or zlib.dll depending on the OS and hardware, but it pays off).

Will it be worth for MD5 or SHA1 or both .NET and Mono rely on a native implementation already?

(Edited) Also: in case I've to stick to the MD5CryptoServiceProvider, is there a way in which I can calculate a hash of a file while I'm reading it? I mean, send bytes in chunks but still calculate the whole hash?

Spinning answered 27/6, 2009 at 0:6 Comment(9)
Keep in mind that MD5 is a cracked hashing algorithm, and isn't considered secure any more. Collisions with SHA1 have also been found, and while not as severe as MD5, they were considered significant breaks. If you need a secure hashing algorithm, you should opt for SHA2 variants (256/512).Ethnography
@pablo, are you sure that hashing and not IO access is your bottleneck...Megavolt
It can also be IO, you're right, but after my experience with zlib, I was thinking whether switching MD5 implementation would speed things up.Spinning
@jrista, I'm not so concerned about security since what I'm doing is hashing file contents to quickly check if they changed or not. Hashing and reading in a single pass would might help too.Spinning
@pablo, from the description of your problem the bottleneck is IO (or somehow the process of moving the data to the hashing function) try commenting out the hashing function for a tad to confirm its the case. I doubt any optimisation of the hashing function will help, its the wrong place to be optimising.Megavolt
@jrista, Collisions occur in any hash algorithm, "cracked" or not. But in the end you're right: SHA256 is the way to go.Kurdistan
I know that collisions can occur in any hash algorithm. However, in SHA1, the chance of a collision is high enough that the algorithm is considered insecure (hence the reason I said it was considered a 'significant break'). Both MD5 and SHA1 are broadly considered cryptographically broken, and should not be used due to their now low collision resistance. SHA1 will also no longer be supported in .NET 4.0 and future versions, so if you need a hashing algorithm with some longevity, SHA2 variants should be used.Ethnography
@Ethnography there are no chance collisions, not from md5 or sha1, both algorithms are vulnerable to collision attacks, which mean people can theoretically engineer collisionsMegavolt
@Sam Saffron "there are no chance collisions" yes there are, 1 in 2^128. But it's safer than say, asteroids destroying the earth.Robinette
M
17

MD5 and SHA1 rely on native implementaions, nonetheless its possible a C++ solution + introp could be slightly faster, cause you could possibly reduce the number of method calls a bit and optimize the native implementation.

Keep in mind that the Native (SHA1CryptoServiceProvider) can be 3X faster than the managed one(SHA1Managed).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Security.Cryptography;

namespace ConsoleApplication22 {



    class Program {

        static void Profile(string description, int iterations, Action func) {

            // clean up
            GC.Collect();
            GC.WaitForPendingFinalizers();
            GC.Collect();

            // warm up 
            func();

            var watch = Stopwatch.StartNew();
            for (int i = 0; i < iterations; i++) {
                func();
            }
            watch.Stop();
            Console.Write(description);
            Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
        }

        static void Main() {
            SHA1Managed managed = new SHA1Managed();
            SHA1CryptoServiceProvider unmanaged = new SHA1CryptoServiceProvider();

            Random rnd = new Random();

            var buffer = new byte[100000];
            rnd.NextBytes(buffer);

            Profile("managed", 1000, () => {
                managed.ComputeHash(buffer, 0, buffer.Length);
            });

            Profile("unmanaged", 1000, () =>
            {
                unmanaged.ComputeHash(buffer, 0, buffer.Length);
            });

            Console.ReadKey();
        }
    }
}
managed Time Elapsed 891 ms
unmanaged Time Elapsed 336 ms

Also Keep in mind unless my calculation is wrong, the unmanaged implementation is hashing 100MB of data in about 300 milliseconds, this would very rarely be a bottleneck.

Megavolt answered 27/6, 2009 at 0:15 Comment(5)
An interop solution would require marshalling, however, which could mitigate any other gains that might otherwise be made. Just something to keep in mind.Ethnography
my understanding is the SHA1CryptoServiceProvider requires marshalling anyway, its using extern callsMegavolt
Fortunately I checked I'm using the unmanaged one: MD5CryptoServiceProvider. Loved your profiling example!Spinning
Sam, you're right, the problem must be somewhere else. One question, though: Is there a way to hash in chunks? I need to read the file and also to hash it, can I do it in only one pass?Spinning
Yerp, see TransformBlock and TransformFinalBlock functionsMegavolt
W
3

The SHA1CryptoServiceProvider class uses the underlying Windows API implementation. However, SHA1Managed is pretty fast.

EDIT: Yes, it's possible to compute the hash step by step. The TransformBlock and TransformFinalBlock methods do this.

Wolframite answered 27/6, 2009 at 0:11 Comment(1)
pretty fast can mean many things ... turns out pretty fast means 3 times slower .... still 30MB per 300ms is plenty fastMegavolt
B
0

I would just use the BCL's SHA1 and MD5CryptoServiceProvider classes. The ones that ship with the framework are quite fast.

Batfish answered 27/6, 2009 at 0:11 Comment(1)
Thanks. That's what I'm using right now, I'm just wondering if there's a way to make it faster. I'm hashing entire files.Spinning
A
0

Depending on your application of hashing, MD5 might not be applicable. MD5 is only useful in error correction, it's no longer viable as a check against malicious file alteration.

http://en.wikipedia.org/wiki/Md5#Vulnerability

The short story is, MD5 collisions are easy to generate by changing 16 bytes in a file.

Atherton answered 27/6, 2009 at 2:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.