Why write to Stream in chunks?

Asked 28/11, 2012 at 14:21 Answered 28/11, 2012 at 14:25

I am wondering why so many examples read byte arrays into streams in chucks and not all at once... I know this is a soft question, but I am interested.

I understand a bit about hardware and filling buffers can be very size dependent and you wouldn't want to write to the buffer again until it has been flushed to wherever it needs to go etc... but with the .Net platform (and other modern languages) I see examples of both. So when use which and when, or is the second an absolute no no?

Here is the thing (code) I mean:

var buffer = new byte[4096];

while (true)
{
    var read = this.InputStream.Read(buffer, 0, buffer.Length);

    if (read == 0)
        break;

    OutputStream.Write(buffer, 0, read);
}

rather than:

var buffer = new byte[InputStream.Length];

var read = this.InputStream.Read(buffer, 0, buffer.Length);

OutputStream.Write(buffer, 0, read);

I believe both are legal? So why go through all the fuss of the while loop (in whatever for you decide to structure it)?

I am playing devils advocate here as I want to learn as much as I can :)

Somnambulate answered 28/11, 2012 at 14:21 Comment(0)

In the first case, all you need is 4kB of memory. In the second case, you need as much memory as the input stream data takes. If the input stream is 4GB, you need 4GB.

Do you think it would be good if a file copy operation required 4GB of RAM? What if you were to prepare a disk image that's 20GB?

There is also this thing with pipes. You don't often use them on Windows, but a similar case is often seen on other operating systems. The second case waits for all data to be read, and only then writes them to the output. However, sometimes it is advisable to write data as soon as possible—the first case will start writing to the output stream as soon as the first 4kB of input is read. Think of serving web pages: it is advisable for a web server to send data as soon as possible, so that client's web browser will start rendering headers and first part of the content, not waiting for the whole body.

However, if you know that the input stream won't be bigger than 4kB, then both cases are equivalent.

Scolex answered 28/11, 2012 at 14:24 Comment(3)

The general case, the amount you are holding in memory is more important, therefore if you are filling a buffer (stream) up and not moving it, that's bad. Say if we took OutputStream out of the equation and just filled up the InputStream with a while loop? Because I have seen this as well, that would be just as bad as the second example? – Somnambulate 28/11, 2012 at 14:38

It all depends on your specific case, on what you want to do. There are algorithms which can operate on small chunks (calculating sum of values, finding out maximum), and there are algorithms which need all data (for example: sorting). In the second case it is necessary to read all data. In the first case—not really. – Scolex 28/11, 2012 at 14:50

Cool... so it is application specific, which was my main thinking, I think... rather than the nature of the objects themselves in most languages. Thanks :) – Somnambulate 28/11, 2012 at 15:5

Sometimes, InputStream.Length is not valid for some source, e.g from the net transport, or the buffer maybe huge, e.g read from a huge file. IMO.

Vicinage answered 28/11, 2012 at 14:25 Comment(2)

Thats a really good point... I hadn't thought about that possibility. But that makes a lot of sense, especially if you are close to the metal and reading for a buffer receiving info! – Somnambulate 28/11, 2012 at 14:35

+1 ... wish it could be + 2. I wanted to accept this answer as you put something I hadn't even thought of at all, in a really simple way. Which is always cool. But in the interest of the SO Community, it's best to accept the answer that will be useful to the most people. – Somnambulate 28/11, 2012 at 17:15

It protects you from the situation where your input stream is several gigabytes long.

Chlorine answered 28/11, 2012 at 14:24 Comment(6)

What do you mean by protection? Why would you need it? – Somnambulate 28/11, 2012 at 15:6

Protection against, for example, an OutOfMemoryException. – Chlorine 28/11, 2012 at 15:19

Right. If you read a file, say, into memory that was more than the the application had access to. Given. But that is a possibility with any large amount of data. So the chunkiness of the pattern doesn't protect this, just the flush of the buffer into the output stream and recycle. – Somnambulate 28/11, 2012 at 15:57

@Somnambulate - "So the chunkiness of the pattern doesn't protect this" - sure it does. You only ever need a buffer of the configured size (4096) in your example, rather than a buffer whose size is as big as the input stream. – Chlorine 28/11, 2012 at 17:2

Cool I get... but you'd only have a chunk of the whole file you wanted right? So you'd have to do some actual software engineering and make sure this didn't make the world collapse (ed - app die)! :P – Somnambulate 28/11, 2012 at 17:12

@tigerswithguitars, "but you'd only have a chunk of the whole file you wanted right" - no, you'd get the whole file copied from the input stream to the output stream a chunk at a time. Incidentally, with .NET 4 or later, you can simply use "InputStream.CopyTo(OutputStream)", which uses a default buffer size, or e.g. "InputStream.CopyTo(OutputStream, 4096)" which allows you to specify the buffer size. – Chlorine 28/11, 2012 at 17:42

You have no idea how much data Read might return. This could create major performance problems if you're reading a very large file.

If you have control over the input, and are sure the size is reasonable, then you can certainly read the whole array in at once. But be especially careful if the user can supply an arbitrary input.

Saccharoid answered 28/11, 2012 at 14:24 Comment(0)

Recommended topics

Hot tags