ByteStrings in Haskell: should I use Put or Builder?

Asked 16/7, 2012 at 20:49 Answered 31/7, 2012 at 16:1

Solved haskell functional-programming monads binary-data

I'm confused as to what the Put monad offers over using Builder directly, in Data.Binary. I read the Binary Generation section of Dealing with Binary data, and it seems to assume that you should use Put, but it's pretty short doesn't explain why.

Data.Binary.Put

The Put monad. A monad for efficiently constructing lazy bytestrings.
type Put = PutM ()
Put merely lifts Builder into a Writer monad, applied to ().

Data.Binary.Builder

Efficient construction of lazy byte strings.

What is the point of a Writer monad applied to ()?

I can see that Put is (a type synonym to) a monad whereas Builder is not, but I don't really get why Put would be needed.

In my case, I'm rendering a 3D scene and writing each pixel as a 3 bytes, and then adding on the PPM format's header to the beginning (will use PNG later).

Binary seems like it is meant to be instantiated for types that can be serialized and deserialized to and from binary data. This isn't exactly what I'm doing, but it felt natural to instantiate Binary for my colour type

instance (Binary a) => Binary (Colour a) where
    put (Colour r g b) = put r >> put g >> put b
    get = Colour <$> get <*> get <*> get

This makes it easy to put a Colour Word8 into 24 bits. But then I also have to tack on the header, and I'm not sure how I should do that.

Is Builder meant to be hidden behind the scenes, or does it depend? Is the Binary class only for (de)serializing data, or for all binary generation purposes?

Hesson answered 16/7, 2012 at 20:49 Comment(5)

Not an answer, but you may want to take a look into using blaze-builder (and friends) rather than binary. – Luo 16/7, 2012 at 21:15

@TiloWiklund: I had seen that before. What's different about it? Is it more efficient? – Hesson 16/7, 2012 at 21:16

Among other things. There's a good writeup by one of the authors here: lambda-view.blogspot.se/2010/11/… – Luo 16/7, 2012 at 21:29

The Binary class and its methods Get and Put are essentially for serializing and de-serializing Haskell structures. If you are working with a existing binary format it is better (in my opinion, of course) to avoid the Binary class and instead use the explicit functions like putWord8, putWord32le, etc. – Bondie 16/7, 2012 at 21:36

@stephentetley: That was kind of what I was thinking too, but I wasn't sure… For the colours at least, it's convenient to use Put, and makes sense semantically too, but I don't know if I should use Put for everything or not. – Hesson 16/7, 2012 at 22:57

First of all note the conceptual difference. Builders are for efficient building of bytestring streams, while the PutM monad is really for serialization. So the first question you should ask yourself is whether you are actually serializing (to answer that ask yourself whether there is a meaningful and exact opposite operation – deserialization).

In general I would go with Builder for the convenience it provides. However, not the Builder from the binary package, but in fact from the blaze-builder package. It's a monoid and has many predefined string generators. It is also very composable. Finally it's very fast and can in fact be fine-tuned.

Last but not least if you really want speed, convenience and elegant code you will want to combine this with one of the various stream processor libraries around like conduit, enumerator or pipes.

Delude answered 16/7, 2012 at 21:55 Comment(0)

I can see that Put is a monad whereas Builder is not, but I don't really get why Put would be needed.

To be precise, PutM is the Monad. It's needed for convenience, and to give you fewer opportunities for errors. Writing code in monadic or applicative style is often much more convenient than carrying all the temporaries around explicitly, and with the plumbing done in the Monad instance, you can't accidentally use the wrong Builder in the middle of your function.

You can do everything you do with PutM using only Builder, but usually it's more work to write the code.

But then I also have to tack on the header, and I'm not sure how I should do that.

I don't know the PPM format, so I have no idea how to construct the header. But after constructing it, you can simply use putByteString or putLazyByteString to tack it on.

Sequence answered 16/7, 2012 at 21:15 Comment(2)

Would you agree that Put is only intended for serializing and deserializing, and (for example) rendering data one-way to a binary image (as a ByteString) would be better done using just Builder (possibly blaze-builder), as others have said? – Hesson 17/7, 2012 at 0:58

I'd say put and get are, but Put and Get can reasonably be used with wider scope. However, the primary intention of binary was indeed (de)serialization of Haskell structures, so the interface was shaped for that. Thus blaze-builder may be the better choice (I've never used that, so I don't know how convenient it is to use). – Sequence 17/7, 2012 at 1:23

I'm not sure to what extent this is accurate, but my understanding has always been that the presentation of Put as you see it is largely an abuse of do-notation so that you can write code like this:

putThing :: Thing -> Put
putThing (Thing thing1 thing2) = do
  putThing1 thing1
  putThing2 thing2

We are not using the "essence" of Monad (in particular, we never bind the result of anything) but we gain a convenient and clean syntax for concatenation. However, the aesthetic advantages over the purely monoidal alternative:

putThing :: Thing -> Builder
putThing (Thing thing1 thing2) = mconcat [
  putThing thing1,
  putThing thing2]

are fairly minimal, in my view.

(Note that Get, by contrast, genuinely is a Monad and benefits from being so in clear ways).

Edh answered 31/7, 2012 at 16:1 Comment(0)

Recommended topics

Hot tags