How to transfer a float array (without serializing/deserializing) from Scala (JeroMQ) to C (ZMQ)?
Asked Answered
O

2

6

Currently, I am using a JSON library to serialize the data at the sender (JeroMQ), and deserialize at the receiver (C, ZMQ). But, while parsing, the JSON library starts to consume a lot of memory and the OS kills the process. So, I want to send the float array as it is, i.e. without using JSON.

The existing sender code is below (syn0 and syn1 are Double arrays). If syn0 and syn1 are around 100 MB each, the process is killed while parsing the received arrays, i.e. the last line of the snippet below:

import org.zeromq.ZMQ
import com.codahale.jerkson
socket.connect("tcp://localhost:5556")

socket.send(json.JSONObject(Map("syn0"->json.JSONArray(List.fromArray(syn0Global)))).toString())
println("SYN0 Request sent”)
val reply_syn0 = socket.recv(0)
println("Response received after syn0: " + new String(reply_syn0))
logInfo("Sending Syn1 request … , size : " + syn1Global.length )

socket.send(json.JSONObject(Map("syn1"->json.JSONArray(List.fromArray(syn1Global)))).toString())
println("SYN1 Request sent")
val reply_syn1 = socket.recv(0)

socket.send(json.JSONObject(Map("foldComplete"->"Done")).toString())
println("foldComplete sent")
//  Get the reply.
val reply_foldComplete = socket.recv(0)
val processedSynValuesJson = new String(reply_foldComplete)
val processedSynValues_jerkson =   jerkson.Json.parse[Map[String,List[Double]]](processedSynValuesJson)

Can these arrays be transferred without using JSON?

Here I am transferring a float array between two C programs:

//client.c
int main (void)
{
printf ("Connecting to hello world server…\n");
void *context = zmq_ctx_new ();
void *requester = zmq_socket (context, ZMQ_REQ);
zmq_connect (requester, "tcp://localhost:5555");

int request_nbr;
float send_buffer[10];
float recv_buffer[10];

for(int i = 0; i < 10; i++)
    send_buffer[i] = i;

for (request_nbr = 0; request_nbr != 10; request_nbr++) {
    //char buffer [10];
    printf ("Sending Hello %d…\n", request_nbr);
    zmq_send (requester, send_buffer, 10*sizeof(float), 0);
    zmq_recv (requester, recv_buffer, 10*sizeof(float), 0);
    printf ("Received World %.3f\n", recv_buffer[5]);
}
zmq_close (requester);
zmq_ctx_destroy (context);
return 0;
}

//server.c

int main (void)
{
//  Socket to talk to clients
void *context = zmq_ctx_new ();
void *responder = zmq_socket (context, ZMQ_REP);
int rc = zmq_bind (responder, "tcp://*:5555");
assert (rc == 0);
float recv_buffer[10];
float send_buffer[10];
while (1) {
    //char buffer [10];
    zmq_recv (responder, recv_buffer, 10*sizeof(float), 0);
    printf ("Received Hello\n");
    for(int i = 0; i < 10; i++)
            send_buffer[i] = recv_buffer[i]+5;
    zmq_send (responder, send_buffer, 10*sizeof(float), 0);
}
return 0;
}

Finally, my unsuccessful attempt at doing something similar using Scala (below is the client code):

def main(args: Array[String]) {
val context = ZMQ.context(1)
val socket = context.socket(ZMQ.REQ)

println("Connecting to hello world server…")
socket.connect ("tcp://localhost:5555")
val msg : Array[Float] = Array(1,2,3,4,5,6,7,8,9,10)
val bbuf = java.nio.ByteBuffer.allocate(4*msg.length)
bbuf.asFloatBuffer.put(java.nio.FloatBuffer.wrap(msg))


for (request_nbr <- 1 to 10)  {
    socket.sendByteBuffer(bbuf,0)

}
}
Orelie answered 23/3, 2016 at 18:49 Comment(2)
Which language are you using? You said JSON didn't work, where is you attempt at sending them in binary? You can send the data any way you like.Urinal
Scala. Please see the updated post.Orelie
B
4

SER/DES ? Size ?
No, an underlying transport-philosophy related constraint matters.

You have started with an 0.1 GB sizing for transport-payload and reported a JSON-library allocations to cause your O/S to kill the process.

Next, in other post, you have requested an 0.762 GB sizing for transport-payload.

But there is a bit more important issue in ZeroMQ transport orchestration than a choice of an external data-serialiser SER/DES policy.

No one may forbid you to try to send as big BLOB as possible, whereas a JSON-decorated string has already shown you the dark-side of such approaches, there are other reasons not to proceed this way ahead.

ZeroMQ is out of question a great and powerful toolbox. Still it takes some time for one to gain an insight necessary for indeed a smart and highly performant code-deployment, that makes maximum out of this powerful work-horse.

One of side-effects of the feature-rich internal ecosystem "under-the-hood" is a not very much known policy, hidden in a message delivery concept.

One may send any reasonable-sized message, while a delivery is not guaranteed. It is either completely delivered, or nothing gets out at all, as said above, nothing is guaranteed.

Ouch?!

Yes, not guaranteed.

Based on this core Zero-Guarrantee philosophy, one shall take due care to decide on steps and measures, the more if you plan to try to move Gigabyte BEASTs there and back.

In this very sense, it might become quantitatively supported by real SUT testing, that small-sized messages may transport ( if you indeed still need to move GBs ( refer to comment above, under the OP ) and have no other choice ) the whole volume of data segmented into smaller pieces, with error-prone re-assembly measures, which results in much faster and much safer end-to-end solution than trying to use dumb-force and instruct the code to dump about a GB of data onto whatever resources there actually are available ( Zero-Copy principle of ZeroMQ cannot and will not per-se save you in these efforts ).

For details on another hidden trap, related to not fully Zero-Copy implementation, read Martin SUSTRIK's, co-father of ZeroMQ, remarks on Zero-Copy "till-kernel-boundary-only" ( so, at least double the memory-space allocations to be expected... ).


Solution:

Redesign the architecture so as to propagate small-sized messages, if not keeping an original datastructure "mirrored" in remote process(es) instead of attempting to keep one-shot giga-transfers survivable.


The best next step?

While it does not solve your trouble with a few SLOC-s, the best thing, if you are serious about to invest your intellectual powers into distributed processing, is to read Pieter HINTJEN's lovely book "Code Connected, Vol.1"

Yes, it takes some time to generate one's own insight, but this will raise you in many aspects onto another level of professional code design. Worth time. Worth efforts.

Bilbrey answered 29/3, 2016 at 11:15 Comment(0)
P
3

You'll need to serialize the data in some form or fashion - ultimately you're taking a structure in memory on one side and instructing the other side on how to rebuild that structure (bonus points for using two separate languages where the structure in memory is likely different anyway). I'd suggest you use a new JSON library as that appears to be where the problem lies, but there are more efficient protocols you could be using. Protocol Buffers enjoy good support across many languages, that might be the place I'd start.

Palua answered 24/3, 2016 at 18:51 Comment(5)
Can you please explain why an array of floats cannot be simply transferred as a sequence of bytes. I wrote two C programs which use ZMQ to simply transfer a float array, and it works fine. Now, if I replace the C client with a Scala client, why would it not work. Please see the C code in the updated post, and my attempt to write the client in Scala.Orelie
OK, so looking at your updated question, the buffer is your ad-hoc "serialization" of the data. My assumption is that the implementation of the buffer is somehow different between C and Java - which would be entirely reasonable. The bytes just don't line up, and C doesn't know what to do with the data.Palua
This is where my confusion lies. Are you saying that even if I send a blob of bytes via ZMQ, there is some meta-data attached to it, which can only be interpreted by a client implemented using the same language? Can you please point to a reference?Orelie
That's not what I'm saying, but I'm digging into the code for both libzmq and jeromq so I can speak with a little more authority on what's going on, rather than speculating.Palua
Absent setting up my own environment for C and Scala (I develop in neither) I can't give you a definite answer, but my suggestion is that the byte stream representing the float array created by C is not identical to the byte stream representing the float array created by Scala. I can tell you that in C has no knowledge of the array being a float array - the first thing zmq_send() does is memcpy(), which according to this casts to an unsigned char array. Your best bet is to attempt to print out the byte representation and see yourselfPalua

© 2022 - 2024 — McMap. All rights reserved.