Do cereal and Boost Serialization use zero-copy?
Asked Answered
F

2

19

I have done some performance comparison between several serialization protocols, including FlatBuffers, Cap'n Proto, Boost serialization and cereal. All the tests are written in C++.

I know that FlatBuffers and Cap'n Proto use zero-copy. With zero-copy, serialization time is null but size of serialized objects is bigger.

I thought that cereal and Boost serialization didn't use zero-copy. However, serialization time (for int and double) is nearly null, and size of serialized objects is nearly the same as Cap'n Proto or Flatbuffers ones. I didn't find any information about zero-copy in their documentations.

Do cereal and Boost serialization use zero-copy too ?

Frymire answered 23/1, 2017 at 8:21 Comment(1)
"serialization time is null". What does this even mean? Could you please elaborate?Keeler
T
31

Boost and Cereal do not implement zero-copy in the sense of Cap'n Proto or Flatbuffers.

With true zero-copy serialization, the backing store for your live in-memory objects is in fact exactly the same memory segment that is passed to the read() or write() system calls. There is no packing/unpacking step at all.

Generally, this has a number of implications:

  • Objects are not allocated using new/delete. When constructing a message, you allocate the message first, which allocates a long contiguous memory space for the message contents. You then allocate the message structure directly inside the message, receiving pointers that in fact point into the message's memory. When the message is later written, a single write() call shoves this whole memory space out to the wire.
  • Similarly, when you read in a message, a single read() call (or maybe 2-3) reads in the entire message into one block of memory. You then get a pointer (or, a pointer-like object) to the "root" of the message, which you can use to traverse it. Note that no part of the message is actually inspected until your application traverses it.
  • With normal sockets, the only copies of your data happen in kernel space. With RDMA networking, you may even be able to avoid kernel-space copies: the data comes off the wire directly into its final memory location.
  • When working with files (rather than networks) it's possible to mmap() a very large message directly from disk and use the mapped memory region directly. Doing so is O(1) -- it doesn't matter how big the file is. Your operating system will automatically page in the necessary parts of the file when you actually access them.
  • Two processes on the same machine can communicate through shared memory segments with no copies. Note that, generally, regular old C++ objects do not work well in shared memory, because the memory segments usually don't have the same address in both memory spaces, thus all the pointers are wrong. With a zero-copy serialization framework, the pointers are usually expressed as offsets rather than absolute addresses, so that they are position-independent.

Boost and Cereal are different: When you receive a message in these systems, first a pass is performed over the entire message to "unpack" the contents. The final resting place of the data is in objects allocated in the traditional way using new/delete. Similarly, when sending a message, the data has to be collected from this tree of objects and packed together into one buffer in order to be written out. Even though Boost and Cereal are "extensible", being truly zero-copy requires a very different underlying design; it cannot be bolted-in as an extension.

That said, don't assume zero-copy will always be faster. memcpy() can be pretty fast, and the rest of your program may dwarf the cost. Meanwhile, zero-copy systems tend to have inconvenient APIs, particularly because of the restrictions on memory allocation. It may be overall a better use of your time to use a traditional serialization system.

The place where zero-copy is most obviously advantageous is when manipulating files, since as I mentioned you can easily mmap() a huge file and only read part of it. Non-zero-copy formats simply can't do that. When it comes to networking, though, the advantages are less clear, since the network communication itself is necessarily O(n).

At the end of the day, if you really want to know which serialization system is fastest for your use case, you will probably need to try them all and measure them. Note that toy benchmarks are usually misleading; you need to test your actual use case (or something very similar) to get useful information.

Disclosure: I am the author of Cap'n Proto (a zero-copy serializer) and Protocol Buffers v2 (a popular non-zero-copy serializer).

Tieshatieup answered 23/1, 2017 at 23:24 Comment(1)
+1 for a robust and well-worded answer. [It is worth nothing that Boost does indeed have a library for shared memory representation of objects, POD and non-POD (using custom allocators and relative pointers (offset_ptr<>)).].Droughty
D
2

Note: I bountied the other answer which understood the full scope of the question better

Boost Serialization is extensible.

It allows your types to describe what needs to be serialized, and the archives to describe the format.

This can be "zero-copy" - i.e. the only buffering is in the stream that receives your data (e.g. the socket or file descriptor).

For an example of a consciously zero-copy implementation of serialization for dynamic_bitset see the code in this answer: How to serialize boost::dynamic_bitset?

I have a number of these on the site. Also look at the documentation for BOOST_IS_BITWISE_SERIALIZABLE and the the effect it has on container serialization (if you serialize a contiguously allocated collection of bitwise-serializable data, the upshot is zero-copy or even __memcpy_sse4 etc.).

Side-note: Cap'n proto does something else entirely, AFAIK: it marshals some objects as futures-to-the-data. This is apparently what they advertise aggressively as "∞% faster, 0µs!!!" (which is somewhat true in the case where the data is never retrieved).

Droughty answered 23/1, 2017 at 11:9 Comment(11)
Sorry, I'm having trouble understanding how boost serialization could be zero-copy. Being extensible isn't enough: Zero-copy requires a very different kind of design. The presence of a function called "serialize()" which iterates through all contents of the class is very much not zero-copy; that function presumably has to execute at serialization and parsing time, where it is copying bytes to/from the "Archive". Cap'n Proto has no such code, because the structures on-the-wire are already arranged appropriately to be used as live structures in-memory.Tieshatieup
I supplied an example. And I mentioned file buffering. There's not much more to it.Droughty
To make it concrete: With Cap'n Proto I can mmap() a multi-gigabyte message and then start using it as an in-memory data structure in O(1) time. The kernel will only ever page in the pages relevant to the parts of the structure that I actually access; nothing ever touches the other pages. This isn't just about being lazy; the mmap'd region is actually used as the live backing store for the objects. I'm not seeing how you can do that with boost.Tieshatieup
I mean, if you're saying that a memcpy() is fine because it's pretty fast, that's a valid argument for many use cases. But it's not zero-copy, it's one-copy.Tieshatieup
That's not serialization, though. Anyhoops, that's Boost Interprocess (managed_mapped_file or mapped_object/mapped_region for raw access). It's just differnt things. Why are you telling me? Did I somehow misrepresent something you have a stake in?Droughty
@KentonVarda Yeah. Serialization to an archive has a tendency of... serializing to an archive. Of course, if you want to serialize to a socket, or not serialize at all (because you're in shared memory anyways), that's fine.Droughty
As indicated in my profile, I'm the author of Cap'n Proto. Your post says that boost serialization can be zero-copy because it is extensible. I'm pointing out that this is not correct. You can certainly debate whether zero-copy is really useful or not, but you can't call boost zero-copy.Tieshatieup
I may have misunderstood the scope of the question. For direct on-disk format representations, obviously any kind of serialization could be suboptimal compared to serialization. (I assumed serialization implies streaming, as the word suggests)Droughty
@KentonVarda I think I was very clear and said nothing wrong. I clearly indicated the limits of "zero copy" by immediately pointing out there was still serialization. Given streaming, this is zero copy (unless you think of splice/sendfile syscalls which aren't very portable)Droughty
The question was clearly about the term "zero-copy" as defined by Cap'n Proto, Flatbuffers, and the like. You seem to be defining your own new term which you're calling "zero-copy", but your meaning is clearly different. I therefore maintain that your answer is inaccurate in the context of the question. Stack Overflow does not like these kinds of debates happening in comment threads so I'll leave it at that. Feel free to e-mail me or the Cap'n Proto list if you want a longer explanation of why it's not the same (yes, even when streaming).Tieshatieup
Look. I can see your point. Perhaps I misunderstood the scope of the question because I assumed streaming serialization (which in my humble opinion is the regular meaning of serializing (making stuff serial)). Your addition is very welcome, I suggest you post an answer. Right now, I'm unable to second guess my answer as OP seems to have accepted it. Once you added yours, we can see whether that was prematurely.Droughty

© 2022 - 2024 — McMap. All rights reserved.