Calculating MD5 hash of a partial stream
Asked Answered
C

3

7

I have a large dataset (~1GB) stored in a custom file format, the last 16 bytes of which is an MD5 hash of all previous bytes of the file.

I want to verify the MD5 of this file against the embedded MD5 using HashAlgorithm.ComputeHash(Stream), however this will calculate the hash of the entire file INCLUDING the hash in the last 16bytes, which obviously wont work.

How do I compute the MD5 hash of PART of a stream? I know I can read the stream into an array and pass this to HashAlgorithm.ComputeHash(Bytes), however the overhead of duplicating this 1GB of data in memory is prohibitive.

Cordeiro answered 18/5, 2011 at 7:12 Comment(2)
Please note MD5 is no longer a secure hash algorithmSastruga
I'm using this only to verify that the file has not been corrupted on disk or in memory, so this isn't an issue.Cordeiro
K
5

Taken from here where you can also get other ways of doing so.

Make a partial file stream class, read the size you want and make hash of it.

 class PartialFileStream : FileStream
{
    public PartialFileStream(string path, FileMode mode, long startPosition, long endPosition): base(path, mode)
{
  base.Seek(startPosition, SeekOrigin.Begin);
  ReadTillPosition = endPosition;
}

public long ReadTillPosition { get; set; }

public override int Read(byte[] array, int offset, int count)
{
 if (base.Position >= this.ReadTillPosition)
   return 0;

 if (base.Position + count > this.ReadTillPosition)
   count = (int)(this.ReadTillPosition - base.Position);

 return base.Read(array, offset, count);
  }
}
Katharinekatharsis answered 18/5, 2011 at 7:20 Comment(2)
The user of PartialFileStream can still Seek outside of the segment. Also the Length and Position properties do not match the segment but those of the file.Pigeon
I've posted my solution which should fix issues mentioned in my previous comment.Pigeon
C
0

You can use the FileStream.Seek option to seek to a particular position of the stream and read from there.

Covenanter answered 18/5, 2011 at 7:22 Comment(1)
His problem is different - he could seek but he does not want to hash the end of the stream. Of course if the hash could be at the beginning of the file he could just use seek and hash the rest.Katharinekatharsis
P
0

I've found myself needing this for a second time within 6 months, so posting my solution for a partial input stream.

class PartialStream: Stream {
    public Stream Source { get; }
    public long Offset { get; }
    public override long Length { get; }

    private long End => Offset + Length;

    public override bool CanRead => true;

    public override bool CanSeek => false;

    public override bool CanWrite => false;

    public override long Position {
        get => Source.Position - Offset;
        set => throw new NotSupportedException();
    }

    public PartialStream(Stream source, long length) {
        Offset = source.Position;
        Length = length;
    }

    public PartialStream(Stream source, long offset, long length, bool seekToOffset = true) {
        if (seekToOffset) source.Seek(offset, SeekOrigin.Begin);
        Offset = offset;
        Length = length;
    }

    public override int Read(byte[] array, int offset, int count) {
        if (Source.Position >= End) return 0;

        if (Source.Position + count > End)
            count = (int)(End - Source.Position);

        return Source.Read(array, offset, count);
    }

    public override void Flush() => throw new NotSupportedException();
    public override long Seek(long offset, SeekOrigin origin) => throw new NotSupportedException();
    public override void SetLength(long value) => throw new NotSupportedException();
    public override void Write(byte[] buffer, int offset, int count) => throw new NotSupportedException();
}
Pigeon answered 12/6, 2019 at 23:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.