MessagePack and datetime
Asked Answered
H

3

5

I need a fast way to send 300 short messages a second over zeromq between the python multiprocessing processes. Each message needs to contain an ID and time.time()

msgpack seems like the best way to serialize the dict before sending it via zeromq, and conveniently, msgpack has an example of exactly what I need, except it has a datetime.datetime.now().

import datetime

import msgpack

useful_dict = {
    "id": 1,
    "created": datetime.datetime.now(),
}

def decode_datetime(obj):
    if b'__datetime__' in obj:
        obj = datetime.datetime.strptime(obj["as_str"], "%Y%m%dT%H:%M:%S.%f")
    return obj

def encode_datetime(obj):
    if isinstance(obj, datetime.datetime):
        return {'__datetime__': True, 'as_str': obj.strftime("%Y%m%dT%H:%M:%S.%f")}
    return obj


packed_dict = msgpack.packb(useful_dict, default=encode_datetime)
this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime)

The problem is that their example doesn't work, I get this error:

    obj = datetime.datetime.strptime(obj["as_str"], "%Y%m%dT%H:%M:%S.%f")
KeyError: 'as_str'

Maybe because I'm on python 3.4, but I don't know what the issue with strptime. Would appreciate your help.

Him answered 18/5, 2015 at 21:39 Comment(0)
C
3

Python3 and Python2 manage differently strings encoding : encoding-and-decoding-strings-in-python-3-x

Then it is needed to :

  • use b'as_str' (instead of 'as_str') as dictionary key
  • use encode and decode for the stored value

Modifying the code like this works with python2 and python3 :

import datetime
import msgpack

useful_dict = {
    "id": 1,
    "created": datetime.datetime.now(),
}

def decode_datetime(obj):
    if b'__datetime__' in obj:
        obj = datetime.datetime.strptime(obj[b'as_str'].decode(), "%Y%m%dT%H:%M:%S.%f")
    return obj

def encode_datetime(obj):
    if isinstance(obj, datetime.datetime):
        obj = {'__datetime__': True, 'as_str': obj.strftime("%Y%m%dT%H:%M:%S.%f").encode()}
    return obj


packed_dict = msgpack.packb(useful_dict, default=encode_datetime)
this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime)
Calder answered 19/5, 2015 at 19:6 Comment(0)
K
4

Given that messagepack ( import msgpack ) is good at serializing integers, I created a solution which only uses integers:

_datetime_ExtType = 42

def _unpacker_hook(code, data):
    if code == _datetime_ExtType:
        values = unpack(data)

        if len(values) == 8:  # we have timezone
            return datetime.datetime(*values[:-1], dateutil.tz.tzoffset(None, values[-1]))
        else:
            return datetime.datetime(*values)

    return msgpack.ExtType(code, data)


# This will only get called for unknown types
def _packer_unknown_handler(obj):
    if isinstance(obj, datetime.datetime):
        if obj.tzinfo:
            components = (obj.year, obj.month, obj.day, obj.hour, obj.minute, obj.second, obj.microsecond, int(obj.utcoffset().total_seconds()))
        else:
            components = (obj.year, obj.month, obj.day, obj.hour, obj.minute, obj.second, obj.microsecond)

        # we effectively double pack the values to "compress" them
        data = msgpack.ExtType(_datetime_ExtType, pack(components))
        return data

    raise TypeError("Unknown type: {}".format(obj))

def pack(obj, **kwargs):
    # we don't use a global packer because it wouldn't be re-entrant safe
    return msgpack.packb(obj, use_bin_type=True, default=_packer_unknown_handler, **kwargs)


def unpack(payload):
    try:
        # we temporarily disable gc during unpack to bump up perf: https://pypi.python.org/pypi/msgpack-python
        gc.disable()
        # This must match the above _packer parameters above.  NOTE: use_list is faster
        return msgpack.unpackb(payload, use_list=False, encoding='utf-8', ext_hook=_unpacker_hook)
    finally:
        gc.enable()
Kaikaia answered 24/3, 2016 at 21:19 Comment(1)
btw, I realized later this this particular solution is good at maintaining the original timezone..however a better solution would be to use a utc timestamp if we have a timezone, otherwise do the date serialization.Kaikaia
C
3

Python3 and Python2 manage differently strings encoding : encoding-and-decoding-strings-in-python-3-x

Then it is needed to :

  • use b'as_str' (instead of 'as_str') as dictionary key
  • use encode and decode for the stored value

Modifying the code like this works with python2 and python3 :

import datetime
import msgpack

useful_dict = {
    "id": 1,
    "created": datetime.datetime.now(),
}

def decode_datetime(obj):
    if b'__datetime__' in obj:
        obj = datetime.datetime.strptime(obj[b'as_str'].decode(), "%Y%m%dT%H:%M:%S.%f")
    return obj

def encode_datetime(obj):
    if isinstance(obj, datetime.datetime):
        obj = {'__datetime__': True, 'as_str': obj.strftime("%Y%m%dT%H:%M:%S.%f").encode()}
    return obj


packed_dict = msgpack.packb(useful_dict, default=encode_datetime)
this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime)
Calder answered 19/5, 2015 at 19:6 Comment(0)
P
0

In the documentation it says that the default encoding of unicode strings of packb (and pack) is utf-8. You could simply search for the unicode string '__datetime__' in the decode_datetime function, instead of the byte object b'__datetime__', and add the encoding='utf-8' parameter to unpack. Like so:

def decode_datetime(obj):
    if '__datetime__' in obj:
        obj = datetime.datetime.strptime(obj["as_str"], "%Y%m%dT%H:%M:%S.%f")
    return obj

packed_dict = msgpack.packb(useful_dict, default=encode_datetime)
this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime, encoding='utf-8')
Photogenic answered 24/8, 2017 at 14:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.