How can I read/stream a file without loading the entire file into memory?
Asked Answered
C

5

20

How can I read an arbitrary file and process it "piece by piece" (meaning byte by byte or some other chunk size that would give the best read performance) without loading the entire file into memory? An example of processing would be to generate an MD5 hash of the file although the answer could apply to any operation.

I'd like to have or write this but if I can get existing code that would be great too.

(c#)

Copulative answered 28/7, 2011 at 21:25 Comment(1)
Look, the real answer is "System.IO.FileStream does NOT load the file into memory."Signification
P
32

Here's an example of how to read a file in chunks of 1KB without loading the entire contents into memory:

const int chunkSize = 1024; // read the file by chunks of 1KB
using (var file = File.OpenRead("foo.dat"))
{
    int bytesRead;
    var buffer = new byte[chunkSize];
    while ((bytesRead = file.Read(buffer, 0, buffer.Length)) > 0)
    {
        // TODO: Process bytesRead number of bytes from the buffer
        // not the entire buffer as the size of the buffer is 1KB
        // whereas the actual number of bytes that are read are 
        // stored in the bytesRead integer.
    }
}
Printer answered 28/7, 2011 at 21:29 Comment(5)
Please clarify why this code doesn't completely read the file into memory. Also please explain your TODO section.Roorback
This loads 1KB (or chunkSize bytes) into memory. Edit: He also meant that not the whole buffer is written! Only bytes from index 0 to index bytesRead - 1.Signification
@Darin - Ignore my question in the first comment. I see that as a result of the file.Read that only the chunk # of bytes are read.Copulative
@Darin I tried this code and it does not read the last chunk correctly. It keeps garbage values in the buffer if the last chunk is smaller than chunkSizeMummy
@Mummy You need to read bytesRead length, not entire length of buffer, e.g. fsFileStream.Write(buffer, 0, bytesRead)Sciatica
S
11

System.IO.FileStream does not load the file into memory.
This stream is seekable and MD5 hashing algorithm doesn't have to load the stream(file) intro memory either.

Please replace file_path with the path to your file.

byte[] hash = null;

using (var stream = new FileStream(file_path, FileMode.Open))
{
    using (var md5 = new System.Security.Cryptography.MD5CryptoServiceProvider())
    {
        hash = md5.ComputeHash(stream);
    }
}

Here, your MD5 Hash will be stored in the hash variable.

Signification answered 28/7, 2011 at 21:28 Comment(1)
For future stackers, you only need one using statement -- if i remember correctly they can be bunched together.Linin
G
4
   int fullfilesize = 0;// full size of file
    int DefaultReadValue = 10485760; //read 10 mb at a time
    int toRead = 10485760;
    int position =0;

  //  int 
 //   byte[] ByteReadFirst = new byte[10485760];

    private void Button_Click(object sender, RoutedEventArgs e)
    {
        using (var fs = new FileStream(@"filepath", FileMode.Open, FileAccess.Read))
        {
            using (MemoryStream requestStream = new MemoryStream())
            {


                fs.Position = position;

                if (fs.Position >= fullfilesize)
                {
                    MessageBox.Show(" all done");
                    return;
                }
                System.Diagnostics.Debug.WriteLine("file position" + fs.Position);

                if (fullfilesize-position < toRead)
                {
                    toRead = fullfilesize - position;
                    MessageBox.Show("last time");
                }
                System.Diagnostics.Debug.WriteLine("toread" + toRead);
                int    bytesRead;
                byte[] buffer = new byte[toRead];
                int offset = 0;
                position += toRead;
                while (toRead > 0 && (bytesRead = fs.Read(buffer, offset, toRead)) > 0)
                {
                    toRead -= bytesRead;
                    offset += bytesRead;
                }

                toRead = DefaultReadValue;


            }
        }
    }

Copying Darin's , this method will read 10mb chunks till the end of the file

Glia answered 16/1, 2014 at 12:52 Comment(1)
Although the MemoryStream in your example is not required, you are the only one who posted an example where you set the FileStream Position. This has solved my issue where I needed to split and transfer large files in 10 meg chunks. Upvoted!Nebulize
M
2
const int MAX_BUFFER = 1024;
byte[] Buffer = new byte[MAX_BUFFER];
int BytesRead;
using (System.IO.FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read))
    while ((BytesRead = fileStream.Read(Buffer, 0, MAX_BUFFER)) != 0)
    {
        // Process this chunk starting from offset 0 
        // and continuing for bytesRead bytes!
    }
Moxley answered 28/7, 2011 at 21:34 Comment(0)
L
1
const long numberOfBytesToReadPerChunk = 1000;//1KB
using (BinaryReader fileData = new BinaryReader(File.OpenRead(aFullFilePath))
    while (fileData.BaseStream.Position - fileData.BaseStream.Length > 0)
        DoSomethingWithAChunkOfBytes(fileData.ReadBytes(numberOfBytesToReadPerChunk));

As I understand the functions used here (specifically BinaryReader.ReadBytes), there is no need to track how many bytes you've read. You just need to know the length and current position for the while loop -- which the stream tells you.

Linin answered 30/6, 2016 at 17:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.