Why do most serializers use a stream instead of a byte array?

Asked 24/3, 2017 at 13:24 Answered 24/3, 2017 at 23:48

I am currently working on a socket server and I was wondering Why do serializers like

all require a Stream instead of a byte array?

Blancmange answered 24/3, 2017 at 13:24 Comment(3)

because it gets vey big, and what are you going to do with a byte[] anyway? (except stream it somewhere, be it network or file?) – Minna 24/3, 2017 at 16:12

You can easily wrap a byte array in a stream (MemoryStream). The opposite is significantly more difficult if not impossible. Using a stream gives you all the flexibility to do whatever you want. More conceptually: it's a heck of a lot easier to access randomly-accessible information sequentially than it is to access sequentially-accessible information randomly. So a sequential philosophy covers all the bases easily. – Barchan 24/3, 2017 at 16:17

One of the most useful features of a MemoryStream is that it is able to project an array segment without having to partially copy it: new MemoryStream(buffer, index, count) – Navigable 27/10, 2020 at 7:29

It means you can stream to arbitrary destinations rather than just to memory.

If you want to write something to a file, why would you want to create a complete copy in memory first? In some cases that could cause you to use a lot of extra memory, possibly causing a failure.

If you want to create a byte array, just use a MemoryStream:

var memoryStream = new MemoryStream();
serializer.Write(foo, memoryStream); // Or whatever you're using
var bytes = memoryStream.ToArray();

So with an abstraction of "you use streams" you can easily work with memory - but if the abstraction is "you use a byte array" you are forced to work with memory even if you don't want to.

Longevity answered 24/3, 2017 at 13:26 Comment(5)

Alright, but in my case, I am receiving data from a socket and then I want to deserialize it to an object. So I have to create a temporary stream just to deserialize it. wouldn't it be easier if they made an overload that supports byte[] – Blancmange 24/3, 2017 at 13:35

@CodeJoy: Doesn't the stream expose a socket? That would normally be the case. And is it really so hard to write SomeMethod(new MemoryStream(bytes)) rather than SomeMethod(bytes)? Doesn't seem like much of a hardship to me. – Longevity 24/3, 2017 at 13:37

@CodeJoy, with Jon Skeet suggested method, you have the same "difficult" in writing code, more or less the same overhead (a MemoryStream is somehow a wrapper on a memory byte array), and much more flexibility. – Micronucleus 24/3, 2017 at 13:53

In case anyone reading this answer is wondering where the 'extra memory' comes from when serialising, imagine the case of serialising to an XML-based format - for every numeric field there's the number written in full plus the XML tags surrounding it, meaning a 4 byte int can increase to a minimum of 8 bytes (i.e. <n>0</n>) or 5 bytes if using an attribute (i.e. i="0"). – Toilsome 24/3, 2017 at 23:53

@Pharap: Well, that's part of it - the other part is that even if it only takes as much memory as your original graph, it's still a whole extra copy of your object graph... – Longevity 25/3, 2017 at 6:30

You can easily make a stream over a byte array...but a byte array is inherently size-constrained, where a stream is open-ended...big as you need. Some serialization can be pretty enormous.

Edit: Also, if I need to implement some kind of serialization, I want to do it for the most basic abstraction, and avoid having to do it over multiple abstractions. Stream would be my choice, as there are stream implementations over lots of things: memory, disk, network and so forth. As an implementer, I get those for "free".

Oblation answered 24/3, 2017 at 13:26 Comment(0)

if you use a byte array/ buffer you are working temporarily in memory and you are limited in size

While a stream is something that lets you store things on disk, send across to other computers such as the internet, serial port, etc. streams often use buffers to optimize transmission speed.

So streaming will be useful if you are dealing with a large file

Indicator answered 24/3, 2017 at 13:30 Comment(0)

@JonSkeet's answer is the correct one, but as an addendum, if the issue you're having with making a temporary stream is "I don't like it because it's effort" then consider writing an extension method:

namespace Project.Extensions
{
    public static class XmlSerialiserExtensions
    {
        public static void Serialise(this XmlSerializer serialiser, byte[] bytes, object obj)
        {
            using(var temp = new MemoryStream(bytes))
                serialiser.Serialize(temp, obj);
        }

        public static object Deserialise(this XmlSerializer serialiser, byte[] bytes)
        {
            using(var temp = new MemoryStream(bytes))
                return serialiser.Deserialize(temp);
        }
    }
}

So you can go ahead and do

serialiser.Serialise(buffer, obj);
socket.Write(buffer);

socket.Read(buffer);
var obj = serialiser.Deserialise(buffer);

Toilsome answered 24/3, 2017 at 23:48 Comment(0)

-3

Byte arrays were used more often when manipulating ASCII (i.e. 1-byte) strings of characters often in machine dependent applications, such as buffers. They lend themselves more to low-level applications, whereas "streams" is a more generalized way of dealing with data, which enables a wider range of applications. Also, streams are a more abstract way of looking at data, which allows considerations such as character type (UTF-8, UTF-16, ASCII, etc.) to be handled by code that is invisible to the user of the data stream.

Haemorrhage answered 24/3, 2017 at 17:34 Comment(0)

Recommended topics

Hot tags