receiving variable size of data over TCP sockets
Asked Answered
P

2

7

I ran into a little issue with transferring data over (TCP) sockets. Small background on what I am doing:

I am sending data from side A to B. Data sent can be of variable lengths, assuming max size to be of 1096 bytes.

A) send(clientFd, buffer, size, NULL)

on B, since I dont know what size to expect, I always try to receive 1096 bytes:

B) int receivedBytes = receive(fd, msgBuff, 1096, NULL)

However, when I did this: I realized A was sending small chunks of data..say around 80-90 bytes. After a few bursts of sending, B was clubbing them together to have receivedBytes to be 1096. This obviously corrupted data and hell broke loose.

To fix this, I broke my data in two parts: header and data.

struct IpcMsg
{
   long msgType;
   int devId;
   uint32_t senderId;
   uint16_t size; 
   uint8_t value[IPC_VALUES_SIZE]; 
};

On A side:

A) send(clientFd, buffer, size, NULL)

on B, I first receive the header and determine the size of payload to receive: and then receive the rest of the payload.

B) int receivedBytes = receive(fd, msgBuff, sizeof(IpcMsg) - sizeof( ((IpcMsg*)0)->value ), 0);
int sizeToPoll = ((IpcMsg*)buffer)->size;
printf("Size to poll: %d\n", sizeToPoll);

if (sizeToPoll != 0)
{
        bytesRead = recv(clientFd, buffer + receivedBytes, sizeToPoll, 0); 
}

So, for every send which has a payload, I end up calling receive twice. This worked for me, but I was wondering if there is a better way of doing this ?

Pal answered 12/9, 2014 at 7:56 Comment(7)
TCP is a streaming protocol, it sends data as a stream, which means that when you receive, you may or may not receive as much as you ask for. If you receive less than the expected message size, you have to buffer the received data and do multiple calls to recv to receive all data.Gehenna
@JoachimPileborg: I didnt understand the sentinel suggestion. As for the other suggestion: are you suggesting to do two sends per one send request ? (PS: I have a LOT of sends that go through and I need to keep them at a minimum, since it creates a lag/delay)Pal
As for how to send/receive variable-length data, you can either send the actual data size in a fixed-size "header" followed by receiving the actual data; or you could have a "sentinel", a value that can't be in the data, to mark the end of the data. However, which ever method you use you have to make multiple calls to recv, but on a modern computer the performance penalty for multiple send/recv calls is negligible (and TCP will probably put the data from two consecutive send calls in a single packet anyway).Gehenna
@JoachimPileborg: Can you please point me to an example for the 'sentinel' suggestion ? Also, am I correct in understanding - send the actual data size in a fixed size header - I will have to make 2 send calls - a. header (as you suggested) and b. actual data ?Pal
You need to have your code "understand" when it's receiving one packet or another. If actual data for a particular packet is "short", then do another recv call. You can either use a small header which holds "how long is the next part", and then receive until you've got it all. Or put a "marker" in your data to indicate "the end of data", (-1, 0, 0xDEADDEAD or something else that is not valid data). Either way, your receiving code needs to understand what you are sending, and where it is in the sequence.Rusel
Thanks to Nagle's algorithm two consecutive send calls will not introduce any network "lag", as data from both calls will be sent as a single packet (if the amount of data is less than the MTU). And if the data is sent as a single packet, it doesn't matter how many recv calls you make, it will still read from the single packet.Gehenna
@Pal You can send the header and the message at the same time via 'gather-read', with the sendmsg() function.Nazler
H
4

You're on the right lines with the idea of sending a header that contains basic information about the following data, followed by the data itself. However, this won't always work:

int receivedBytes = receive(fd, msgBuff, sizeof(IpcMsg) - sizeof( ((IpcMsg*)0)->value ), 0);
int sizeToPoll = ((IpcMsg*)buffer)->size;

The reason is that TCP is free to fragment and send your header in as many chunks as it sees fit based on its own assessment of the underlying network conditions applied to what's called the congestion control strategy. On a LAN you'll pretty much always get your header in one packet but try it across the world through the internet and you may get a much smaller number of bytes at a time.

The answer is to not call TCP's 'receive' (usually recv) directly but abstract it away into a small utility function that takes the size you really must receive and a buffer to put it into. Go into a loop receiving and appending packets until all data has arrived or an error occurs.

If you need to go asynchronous and serve multiple clients simultaneously then the same principal applies but you need to go investigate the 'select' call that allows you to be notified when data arrives.

Hendiadys answered 12/9, 2014 at 8:32 Comment(0)
B
2

TCP/IP is a "raw" interface for sending data. It does guarantee that, if the bytes are sent, that they are all there and in the right order, but does not make any guarantees about chunking and knows nothing about the data you are sending.

Therefore, if sending a "packet" over TCP/IP that is to be processed as such, you must know when you have a full packet by one of the following techniques:

  • Fixed-sized packets. In your case 1096 bytes
  • First send / receive a known "header" that will tell you the size of the packet being sent.
  • Use some kind of "end of packet" symbol.

In either of the first two, you know the number of bytes you are expecting to receive so you need to buffer anything you receive until you have the full message, then process that.

If you receive more than you expected, i.e. it spills over into the next packet, you split that, process the completed packet and leave the remainder buffered for processing subsequently.

In the latter case where you have an end of packet symbol, that could be anywhere in your message so anything that follows it, you buffer for the next packet.

Bouchier answered 12/9, 2014 at 10:9 Comment(2)
how can the size of the real data be effectively known? For example, say I want to sent an image of size 5Mb to the server, how can I guarantee that 5Mb (apprx 5000000) will fit directly into x bytes so that the server can be instructed like: "Hey, the first 2 bytes in the received byte array always contain the length of the data"?Cacophony
Most likely you would send a "header" message to specify that you are about to sending 5MB of data then you'd send the data. The receiver will receive chunks of any size but will know to allocate 5MB for the image and will know as the bytes come in when the image has been fully sent (or if the connection failed during sending)Bouchier

© 2022 - 2024 — McMap. All rights reserved.