lists or dicts over zeromq in python
Asked Answered
S

7

18

What is the correct/best way to send objects like lists or dicts over zeromq in python? What if we use a PUB/SUB pattern, where the first part of the string would be used as a filter?

  • I am aware that there are multipart messages, but they where originally meant for a different purpose. Further you can not subscribe all messages, which have a certain string as the first element.
Shuttering answered 25/2, 2012 at 15:35 Comment(0)
F
20

Manual serialization

You turn the data into a string, concatenate or else, do your stuff. It's fast and doesn't take much space but requires work and maintenance, and it's not flexible.

If another language wants to read the data, you need to code it again. No DRY.

Ok for very small data, but really the amount of work is usually not worth it unless you are looking for speed and memory effiency and that you can measure that your implementation is significantly better.

Pickle

Slow, but you can serialize complex objects, and even callable. It's powerfull, and it's so easy it's a no brainer.

On the other side it's possible to end up with something you can't pickle and break your code. Plus you can't share the data with any lib written in an other language.

Eventually, the format is not human readable (hard do debug) and quite verbose.

Very nice to share objects and tasks, not so nice for messages.

json

Reasonably fast, easy to implement with simple to averagely complex data structures. It's flexible, human readible and data can be shared accross languages easily.

For complex data, you'll have to write a bit of code.

Unless you have a very specific need, this is probably the best balance between features and complexity. Espacially since the last implementation in the Python lib is in C and speed is ok.

xml

Verbose, hard to create and a pain to maintain unless you got some heavy lib that that does all the job for you. Slow.

Unless it's a requirement, I would avoid it.

In the end

Now as usual, speed and space efficiency is relative, and you must first answer the questions:

  • what efficiency do I need ?
  • what am I ready to pay (money, time, energy) for that ?
  • what solution fits in my current system ?

It's all what matters.

That wonderful moment of philosophy passed, use JSON.

Favored answered 25/2, 2012 at 16:45 Comment(5)
Pickle is worse than you show, as it forces you to keep the code on both sides totally in sync or write pickle helpers all over the place. Not even to mention the security aspects.Brentwood
@schlenk: Actually, syncing is true for all type of serialization. Everytime you change the format, you need to be in sync, unless the most complex stuff you are shareing is a list of strings. Which would the, be not problem with a not to sync code using pickle. Plus, as for security, if you are the one sending and receiving the messages, it's not a problem. We are not talking about an API here.Favored
Good discussion but nothing is said about the filters. Since the filter (setsocketopt) uses strings, how can you easily set a filter for an object? I hope the answer is not "see how the string looks like and use that"Notum
Filters are usually apply on the routing key, not the data content. But if you do need to apply a filter on the content, json is the easiest way since it's so easy to implement deocode / encode anywhere in the messaging chain.Favored
Nice summary. Would be nice to see how msgpack fits in.Molehill
S
9

JSON:

# Client
socket.send(json.dumps(message))

# Server
message = json.loads(socket.recv())

More info:

Statesman answered 2/9, 2014 at 8:59 Comment(0)
S
8

In zeroMQ, a message is simple a binary blob. You can put anything in it that you want. When you have an object that has multiple parts, you need to first serialize it into something that can be deserialized on the other end. The simplest way to do this is to use obj.repr() which produces a string that you can execute at the other end to recreate the object. But that is not the best way.

First of all, you should try to use a language independent format because sooner or later you will need to interact with applications written in other languages. A JSON object is a good choice for this because it is a single string that can be decoded by many languages. However, a JSON object might not be the most efficient representation if you are sending lots of messages across the network. Instead you might want to consider a format like MSGPACK or Protobufs.

If you need a topic identiffier for PUB_SUB, then simply tack it onto the beginning. Either use a fixed length topic, or place a delimiter between the topic and the real message.

Saransk answered 25/2, 2012 at 16:29 Comment(0)
B
6

Encode as JSON before sending, and decode as JSON after receiving.

Bolometer answered 25/2, 2012 at 15:37 Comment(3)
could you elaborate a bit more? I'd be interested in speed, advantages and disadvantages over just concatenating a string (say comma separated).Shuttering
@DavoudTaghawi-Nejad: the main advantage is that you don't need to write a parser.Syphilis
@DavoudTaghawi-Nejad - How would you go about representing your objects as strings? Would you cast it to the repr value of the object and then eval it back? That would be really limited and prone to problems depending on the objects in your list or dict.Montreal
U
5

Also check out MessagePack

http://msgpack.org/

"It's like JSON. but fast and small"

Unemployed answered 1/2, 2013 at 23:11 Comment(0)
B
3

There are a few questions in that question but in terms of best / correct way to send objects / dics obviously it depends. For a lot of situations JSON is simple and familiar to most. To get it to work I had to use send_string and recv_string e.g.

# client.py

socket.send_string(json.dumps({'data': ['a', 'b', 'c']}))
# server.py

result = json.loads(socket.recv_string())

Discussion in docs https://pyzmq.readthedocs.io/en/latest/unicode.html

Boykins answered 22/6, 2020 at 15:32 Comment(1)
Thanks! Helped me to solve problem with unicode when using send() and recv() methodsTheresatherese
M
2

In case you are interested in seeing examples, I released a small package called pyRpc that shows you how to do a simple python RPC setup where you expose services between different apps. It uses the python zeromq built-in method for sending and receiving python objects (which I believe is simply cPickle)

http://pypi.python.org/pypi/pyRpc/0.1

https://github.com/justinfx/pyRpc

While my examples use the pyobj version of the send and receive calls, you can see there are other versions available that you can use, like send_json, send_unicode... Unless you need some specific type of serialization, you can easily just use the convenience send/receive functions that handle the serialization/deserialization on both ends for you.

http://zeromq.github.com/pyzmq/api/generated/zmq.core.socket.html

json is probably the fastest, and if you need even faster than what is included in zeromq, you could manually use cjson. If your focus is speed then this is a good option. But if you know you will be communicating only with other python services, then the benefit of cPickle is a native python serialization format that gives you a lot of control. You can easily define your classes to serialize the way you want, and end up with native python objects in the end, as opposed to basic values. Im sure you could also write your own object hook for json if you wanted.

Montreal answered 25/2, 2012 at 16:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.