TCP/IP server using IOCP. Occasional data corruption in receive buffers

Asked 2/1, 2015 at 13:0 Answered 7/1, 2015 at 10:34

I’ve been working on an TCP/IP IOCP server application.

I’ve been testing performance (which seems in line with TCP throughput testing utilities) and now have been testing data integrity – this is where I am getting some “weirdness”.

As an initial test, I decided to have a test client send a 1MB block of data over and over where that block is just a sequence of integers incremented one after the other. The idea being that I can verify that each received buffer of data is consistent with no missing data in that buffer, independent from any other buffer received, meaning I don't need to worry what order threads handle the completed receives. (to verify, I extract the first integer in the buffer and scan forward, reset the expected value to 0 if I encounter the max value the client sends. I also have a check to make sure the data received in each is a multiple of 4 (as they are 4 byte integers)).

I seem to be getting random blocks of data missing occasionally from within a buffer, the values will go up 1 by 1, then there will be a bunch skipped. The code though seems quite simple and not much of it. I originally wrote the test in Delphi but after having these issues I rewrote a version in Visual Studio 2010 C++ and seem to be having the same issue (or at least very similar).

There is obviously more code in the real system, but I can boil it down to pretty much this in the worker thread, that just handles the completed receives, verifies the data in the buffer and then posts them again. After I initially accept the connection, I create two overlapped structures and allocate 1MB buffers to each and then call WSARecv for each of these. I've double checked I'm not accidentally sharing the same buffer between the two. Then, the following is pretty much what runs reusing these:

DWORD numberOfBytesTransferred = 0;
ULONG_PTR completionKey = NULL;
PMyOverlapped overlapped = nullptr;

while (true)
{
    auto queueResult = GetQueuedCompletionStatus(iocp, &numberOfBytesTransferred, &completionKey, (LPOVERLAPPED *)&overlapped, INFINITE);
    if (queueResult)
    {
        switch (overlapped->operation)
        {
            case tsoRecv:
            {
                verifyReceivedData(overlapped, numberOfBytesTransferred); // Checks the data is a sequence of incremented integers 1 after the other with no gabs
                overlapped->overlapped = OVERLAPPED(); // Reset the OVERLAPPED structure to defaults

                DWORD flags = 0;
                numberOfBytesTransferred = 0;
                auto returnCode = WSARecv(socket, &(overlapped->buffer), 1, &numberOfBytesTransferred, &flags, (LPWSAOVERLAPPED) overlapped, nullptr);
                break;
            }
            default:;
        }
    }
}

Maybe I am not handling some kind of error or additional information in my simple test above? I originally had an IOCP client sending data but wrote another extremely simple one in Delphi using Indy blocking sockets. It's basically a line of code once connected.

while true do
begin
    IdTCPClient.IOHandler.WriteDirect(TIdBytes(BigData), Length(BigData));
end;

I also wrote another server using a different asynchronous socket component and I haven't had it detect problems with the received data as my IOCP example above does, at least yet. I can post more code and possibly a version to compile but thought I would post the above in case I've missed something obvious. I think using one receive and one send per socket works ok, but it's my understanding that it's valid to post more than one to improve performance.

Omniscient answered 2/1, 2015 at 13:0 Comment(2)

Hmmm... 'I also have a check to make sure the data received in each is a multiple of 4 (as they are 4 byte integers'. How is it guaranteed that each buffer contains an exact multiple of 4 bytes in a TCP stream? I suggest that you take that check out. – Lactalbumin 2/1, 2015 at 19:20

You are right, it's not guaranteed at all which is why I check to ensure numberOfBytesTransferred is a multiple of 4 and abort the test if it is detected. However it just so happens that I am consistently delivered buffers that are multiples of 4 (I assume because I am testing locally and the client is sending continuous neat chunks of data). The only time the check has been triggered is when I've stopped at a breakline in the debugger and then resumed, sometimes the next buffer is an odd number. It's also clear in that case as the integers are far outside the valid range. – Omniscient 4/1, 2015 at 3:48

I believe this is solved - most of my assumptions and code were correct however it seems that for a particular socket, there cannot be simultaneous calls from multiple threads to WSASend or WSARead. There can be multiple outstanding calls for both sends and receives for a particular socket but the actual calls to initiate them need to serialized with a critical section (or similar). This was a slight misunderstanding of the MSDN documentation on my part, I was thinking it could be done but you wouldn't know which buffer would be filled first without some additional synchronization (and my test didn't care which got filled first). It appears it's simply not safe at all unless the calls are made one at a time and can result in corrupted data within the buffers.

The only code I have changed is to add a critical section per connection to protect calls to these and so far have had no issues. I think it might be possible to protect WSASend and WSARecv separately but have not tested that yet.

I had posted a far more in-depth question related to this here that has more code examples.

Omniscient answered 7/1, 2015 at 9:52 Comment(1)

It's usual to submit such calls inside a CS, yes. Similarly, WSARecv() requests often resort to a CS to add a sequence number to the buffer structs so that threads that eventually handle them can reserialize the buffers if they get out-of sequence. – Lactalbumin 8/1, 2015 at 19:11

-1

Take out the loop. You're already checking the received data and scheduling a new asynchronous read, which will, or should, re-enter this code when the read completes. The loop is entirely incorrect.

Parrnell answered 7/1, 2015 at 10:34 Comment(1)

The loop is inside a worker thread, it's absolutely necessary or the thread would exit. Maybe you are thinking of when a callback routine is supplied rather than IOCP? A typical IOCP worker will loop and wait on GetQueuedCompletionStatus and process the dequeued completed requests. I’m not sure if you’ve done any IOCP before but once a previous call to WSARead has completed, a new call needs to be made to WSARead to supply another read buffer to replace the one just processed. It may or may not be handled by the same worker thread that calls it again. Please refer to the answer I posted. – Omniscient 7/1, 2015 at 11:0

Recommended topics

Hot tags