I am working on a data system that needs to store large amounts of simple, extensible data (alongside some specialist indexing we are developing in-house, and not part of this question). I expect there to be billions of records stored, so efficient serialisation is a key part of the system. The serialisation needs to be fast, space-efficient, and supported in multiple platforms and languages (because packing and unpacking this data will be a client component responsibility, not part of the storage system)
The data type is effectively a hash with optional key/value pairs. Keys will be small integers (interpreted at application layer). Values can be a variety of simple data types - String, Integer, Float.
As a technology choice, we have picked MessagePack, and I am writing code to perform data serialisation via Ruby's msgpack-ruby gem.
I don't need the precision of Ruby's 64-bit Float. None of the numbers being stored has meaningful precision even to limits of 32-bit. So I want to use MessagePack support for 32-bit floating point values. This definitely exists. However, the default behaviour in Ruby on any 64-bit system is to serialise Float to 64 bits:
MessagePack.pack(10.3)
=> "\xCB@$\x99\x99\x99\x99\x99\x9A"
Looking at MessagePack code, it seems there is a method MessagePack::Packer#write_float32
, and this does what I expect:
MessagePack::DefaultFactory.packer.write_float32(10.3).to_s
=> "\xCAA$\xCC\xCD"
. . . but I cannot find a way to set up either the default packer or create a new one, that will use this method when serialising a larger structure.
As a test of my comprehension, I tried this:
class Float
def to_msgpack_ext
packer.write_float32(self)
end
def self.from_msgpack_ext s
unpacker.read(s)
end
end
MessagePack::DefaultFactory.register_type(0, Float )
MessagePack.pack(10.3)
=> "\xCB@$\x99\x99\x99\x99\x99\x9A"
No difference at all . . . clearly I am missing or misunderstanding something about the object model used in MessagePack. Is what I want to do possible, and what do I need to do?