I need to decide on a schema for including binary elements into a message object so that it can be decoded again on the receiving end (In my situation a consumer on an Rabbit MQ / AMQP queue).
I decided against multipart MIME encoding over JSON mostly because it seems like using Thor's hammer to push in a thumb tack. I decided against manually joining parts (binary and JSON concatenated together) mostly because every time a new requirement arises it is a whole re-design. JSON with the binary encoded in one of the fields seems like an elegant solution.
My seemingly working (confirmed by comparing MD5-sum of sent and received data) solution is doing the following:
def json_serialiser(byte_obj):
if isinstance(byte_obj, (bytes, bytearray)):
# File Bytes to Base64 Bytes then to String
return base64.b64encode(byte_obj).decode('utf-8')
raise ValueError('No encoding handler for data type ' + type(byte_obj))
def make_msg(filename, filedata):
d = {"filename": filename,
"datalen": len(filedata),
"data": filedata}
return json.dumps(d, default=json_serialiser)
On the receiving end I simply do:
def parse_json(msg):
d = json.loads(msg)
data = d.pop('data')
return base64.b64decode(data), d
def file_callback(ch, method, properties, body):
filedata, fileinfo = parse_json(body)
print('File Name:', fileinfo.get("filename"))
print('Received File Size', len(filedata))
My google-fu left me unable to confirm whether what I am doing is in fact valid. In particular I am concerned whether the line that produces the string from the binary data for inclusion into JSON is correct, eg the line
return base64.b64encode(byte_obj).decode('utf-8')
And it seems that I am able to take a shortcut with the decoding back to binary data as the base64.b64decode()
method handles the UTF-8 data as if it is ASCII - As one would expect it to be coming from the output of base64.b64encode()
... But is this a valid assumption in all cases?
Mostly I'm surprised at not being able to find any examples online of doing this. Perhaps my google patience are still on holiday!
'latin1'
, as described e.g. here, instead of usingbase64
. For examplebyte_obj.decode('latin1')
– Bignonia