Using Stream.Read() vs BinaryReader.Read() to process binary streams
Asked Answered
C

3

26

When working with binary streams (i.e. byte[] arrays), the main point of using BinaryReader or BinaryWriter seems to be simplified reading/writing of primitive data types from a stream, using methods such as ReadBoolean() and taking encoding into account. Is that the whole story? Is there an inherent advantage or disadvantage if one works directly with a Stream, without using BinaryReader/BinaryWriter? Most methods, such as Read(), seem to be the same in both classes, and my guess is that they work identically underneath.

Consider a simple example of processing a binary file in two different ways (edit: I realize this way is ineffective and that a buffer can be used, it's just a sample):

// Using FileStream directly
using (FileStream stream = new FileStream("file.dat", FileMode.Open))
{
    // Read bytes from stream and interpret them as ints
    int value = 0;
    while ((value = stream.ReadByte()) != -1)
    {
        Console.WriteLine(value);
    }
}


// Using BinaryReader
using (BinaryReader reader = new BinaryReader(FileStream fs = new FileStream("file.dat", FileMode.Open)))
{
    // Read bytes and interpret them as ints
    byte value = 0;    
    while (reader.BaseStream.Position < reader.BaseStream.Length)
    {
        value = reader.ReadByte();
        Console.WriteLine(Convert.ToInt32(value));
    }
}

The output will be the same, but what's happening internally (e.g. from OS perspective)? Is it - generally speaking - important which implementation is used? Is there any purpose to using BinaryReader/BinaryWriter if you don't need the extra methods that they provide? For this specific case, MSDN says this in regard to Stream.ReadByte():

The default implementation on Stream creates a new single-byte array and then calls Read. While this is formally correct, it is inefficient.

Using GC.GetTotalMemory(), this first approach does seem to allocate 2x as much space as the second one, but AFAIK this shouldn't be the case if a more general Stream.Read() method is used (e.g. for reading in chunks using a buffer). Still, it seems to me that these methods/interfaces could be unified easily...

Carri answered 11/6, 2013 at 12:2 Comment(2)
The default implementation of Stream.ReadByte is intended to be over-ridden in any concrete implementation of Stream. Which leaves the question unanswered, why do we need a new StreamReader class instead of being able to rely on (implementations of) Stream to do the right thing?Atonsah
@Atonsah because the Stream class is badly designed in general. It's too "general". Not all streams support seeking or ReadByte (efficiently) or reading or writing. It is just bad OOP design. They should've used interfaces instead.Worsen
D
20

No, there is no principal difference between the two approaches. The extra Reader adds some buffering so you shouldn't mix them. But don't expect any significant performance differences, it's all dominated by the actual I/O.

So,

  • use a stream when you have (only) byte[] to move. As is common in a lot of streaming scenarios.
  • use BinaryWriter and BinaryReader when you have any other basic type (including simple byte) of data to process. Their main purpose is conversion of the built-in framework types to byte[].
Donettedoney answered 11/6, 2013 at 12:12 Comment(0)
D
14

One big difference is how you can buffer the I/O. If you are writing/reading only a few bytes here or there, BinaryWriter/BinaryReader will work well. But if you have to read MBs of data, then reading one byte, Int32, etc... at a time will be a bit slow. You could instead read larger chunks and parse from there.

Example:

// Using FileStream directly with a buffer
using (FileStream stream = new FileStream("file.dat", FileMode.Open))
{
    // Read bytes from stream and interpret them as ints
    byte[] buffer = new byte[1024];
    int count;
    // Read from the IO stream fewer times.
    while((count = stream.Read(buffer, 0, buffer.Length)) > 0)
        for(int i=0; i<count; i++)
           Console.WriteLine(Convert.ToInt32(buffer[i]));
}

Now this is a bit off topic... but I'll throw it out there: If you wanted to get VERY crafty... and really give yourself a performance boost... (Albeit, it might be considered dangerous) Instead of parsing EACH Int32, you could do them all at once using Buffer.BlockCopy()

Another example:

// Using FileStream directly with a buffer and BlockCopy
using (FileStream stream = new FileStream("file.dat", FileMode.Open))
{
    // Read bytes from stream and interpret them as ints
    byte[] buffer = new byte[1024];
    int[] intArray = new int[buffer.Length >> 2]; // Each int is 4 bytes
    int count;
    // Read from the IO stream fewer times.
    while((count = stream.Read(buffer, 0, buffer.Length)) > 0)
    {
       // Copy the bytes into the memory space of the Int32 array in one big swoop
       Buffer.BlockCopy(buffer, 0, intArray, count);

       for(int i=0; i<count; i+=4)
          Console.WriteLine(intArray[i]);
    }
}

A few things to note about this example: This one takes 4 bytes per Int32 instead of one... So it will yield different results. You can also do this for other data types other than Int32, but many would argue that marshalling should be on your mind. (I just wanted to present something to think about...)

Disencumber answered 11/6, 2013 at 13:35 Comment(5)
Thank you. The example I provided was hypothetical, I realize it wouldn't be effective for large files in real-life - I always do the buffered approach (your first example) in this case. The way I understand it, both Stream.Read() and BinaryReader.Read() enable you to do the same thing. Also, since .NET 4, Stream.CopyTo() does this for you.Carri
Another thing to think about is if you're doing something like a TCP hand shaking... if you are writing one byte at a time, and the client reads in blocks... then you will have a problem. So in this case, whenever I write, I do it in blocks, and when I read, I read with the BinaryReader. This way, I'm covered.Disencumber
Andrew: in such scenario, I believe the behavior would be the same using both Stream.Read() and BinaryReader.Read() if you were to use a buffer of one byte in both cases.Carri
Agreed, but not if you tried to write a byte, and then the client tried to read an Int32, they would only get one byte when expecting 4.Disencumber
While Stream.ReadByte() returns an Int32, only a single byte is read and the position advanced by 1 byte; the int32 value obtained will be max 255 (when the byte value is 0xff), so this wouldn't present a problem. What you describe would - AFAIK - occur if you used BinaryReader.ReadInt32()Carri
P
0

Both your codes are doing the same ie. ReadByte(), which ends you up with a byte array, so the result from either method is same (from the same file).
The OS implementation (internal difference) is that streams are buffered in virtual memory eg. if you were transporting a file over a network via a stream, you will still have remaining system memory left for other (multi)task(ing).
In case of byte arrays, the whole file will be stored in memory before being transferred to disk (file create) or another stream, hence is not recommended for large files.

There's some discussion of binary data transfer over a network here:
When to use byte array, and when to use stream?

@Jason C & @Jon Skeet makes a good point here:
Why do most serializers use a stream instead of a byte array?

I have noticed my Win 10 machine (4GB RAM) sometimes skips files over 5MB if I transport a file via System.Net.Http.Httpclient GetByteArrayAsync method (vs GetStreamAsync) when I continue working on it, without waiting for transfer to complete.

PS: .Net 4.0 byte array is limited to 2GB

Plumbic answered 19/11, 2019 at 14:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.