Intermittently no data delivered through boost::asio / io completion port

Asked 10/2, 2011 at 11:25 Answered 3/3, 2011 at 12:49

c++windows performance boost-asio io-completion-ports

Problem

I am using boost::asio for a project where two processes on the same machine communicate using TCP/IP. One generates data to be read by the other, but I am encountering a problem where intermittently no data is being sent through the connection. I've boiled this down to a very simple example below, based on the async tcp echo server example.

The processes (source code below) start out fine, delivering data at a fast rate from the sender to the receiver. Then all of a sudden, no data at all is delivered for about five seconds. Then data is delivered again until the next inexplicable pause. During these five seconds, the processes eat 0% CPU and no other processes seem to do anything in particular. The pause is always the same length - five seconds.

I am trying to figure out how to get rid of these stalls and what causes them.

CPU usage during an entire run:

CPU usage during a single run

Notice how there are three dips of CPU usage in the middle of the run - a "run" is a single invocation of the server process and the client process. During these dips, no data was delivered. The number of dips and their timing differs between runs - some times no dips at all, some times many.

I am able to affect the "probability" of these stalls by changing the size of the read buffer - for instance if I make the read buffer a multiple of the send chunk size it appears that this problem almost goes away, but not entirely.

Source and test description

I've compiled the below code with Visual Studio 2005, using Boost 1.43 and Boost 1.45. I have tested on Windows Vista 64 bit (on a quad-core) and Windows 7 64 bit (on both a quad-core and a dual-core).

The server accepts a connection and then simply reads and discards data. Whenever a read is performed a new read is issued.

The client connects to the server, then puts a bunch of packets into a send queue. After this it writes the packets one at the time. Whenever a write has completed, the next packet in the queue is written. A separate thread monitors the queue size and prints this to stdout every second. During the io stalls, the queue size remains exactly the same.

I have tried to used scatter io (writing multiple packets in one system call), but the result is the same. If I disable IO completion ports in Boost using BOOST_ASIO_DISABLE_IOCP, the problem appears to go away but at the price of significantly lower throughput.

// Example is adapted from async_tcp_echo_server.cpp which is
// Copyright (c) 2003-2010 Christopher M. Kohlhoff (chris at kohlhoff dot com)
//
// Start program with -s to start as the server
#ifndef _WIN32_WINNT
#define _WIN32_WINNT 0x0501
#endif                      

#include <iostream>
#include <tchar.h>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/thread.hpp>

#define PORT "1234"
using namespace boost::asio::ip;
using namespace boost::system;

class session {
public:
    session(boost::asio::io_service& io_service) : socket_(io_service) {}

    void do_read() {
        socket_.async_read_some(boost::asio::buffer(data_, max_length),
            boost::bind(&session::handle_read, this, _1, _2));
    }

    boost::asio::ip::tcp::socket& socket() { return socket_; }
protected:
    void handle_read(const error_code& ec, size_t bytes_transferred) {
        if (!ec) {
            do_read();
        } else {
            delete this;
        }
    }

private:
    tcp::socket socket_;
    enum { max_length = 1024 };
    char data_[max_length];
};

class server {
public:
    explicit server(boost::asio::io_service& io_service)
        : io_service_(io_service)
        , acceptor_(io_service, tcp::endpoint(tcp::v4(), atoi(PORT)))
    {
        session* new_session = new session(io_service_);
        acceptor_.async_accept(new_session->socket(),
            boost::bind(&server::handle_accept, this, new_session, _1));
    }

    void handle_accept(session* new_session, const error_code& ec) {
        if (!ec) {
            new_session->do_read();
            new_session = new session(io_service_);
            acceptor_.async_accept(new_session->socket(),
                boost::bind(&server::handle_accept, this, new_session, _1));
        } else {
            delete new_session;
        }
    }

private:
    boost::asio::io_service& io_service_;
    boost::asio::ip::tcp::acceptor acceptor_;
};

class client {
public:
    explicit client(boost::asio::io_service &io_service)
        : io_service_(io_service)
        , socket_(io_service)
        , work_(new boost::asio::io_service::work(io_service))
    {
        io_service_.post(boost::bind(&client::do_init, this));
    }

    ~client() {
        packet_thread_.join(); 
    }

protected:

    void do_init() {
        // Connect to the server
        tcp::resolver resolver(io_service_);
        tcp::resolver::query query(tcp::v4(), "localhost", PORT);
        tcp::resolver::iterator iterator = resolver.resolve(query);
        socket_.connect(*iterator);

        // Start packet generation thread
        packet_thread_.swap(boost::thread(
                boost::bind(&client::generate_packets, this, 8000, 5000000)));
    }

    typedef std::vector<unsigned char> packet_type;
    typedef boost::shared_ptr<packet_type> packet_ptr;

    void generate_packets(long packet_size, long num_packets) {
        // Add a single dummy packet multiple times, then start writing
        packet_ptr buf(new packet_type(packet_size, 0));
        write_queue_.insert(write_queue_.end(), num_packets, buf);
        queue_size = num_packets;
        do_write_nolock();

        // Wait until all packets are sent.
        while (long queued = InterlockedExchangeAdd(&queue_size, 0)) {
            std::cout << "Queue size: " << queued << std::endl;
            Sleep(1000);
        }

        // Exit from run(), ignoring socket shutdown
        work_.reset();
    }

    void do_write_nolock() {
        const packet_ptr &p = write_queue_.front();
        async_write(socket_, boost::asio::buffer(&(*p)[0], p->size()),
            boost::bind(&client::on_write, this, _1));
    }

    void on_write(const error_code &ec) {
        if (ec) { throw system_error(ec); }

        write_queue_.pop_front();
        if (InterlockedDecrement(&queue_size)) {
            do_write_nolock();
        }
    }

private:
    boost::asio::io_service &io_service_;
    tcp::socket socket_;
    boost::shared_ptr<boost::asio::io_service::work> work_;
    long queue_size;
    std::list<packet_ptr> write_queue_;
    boost::thread packet_thread_;
};

int _tmain(int argc, _TCHAR* argv[]) {
    try {
        boost::asio::io_service io_svc;
        bool is_server = argc > 1 && 0 == _tcsicmp(argv[1], _T("-s"));
        std::auto_ptr<server> s(is_server ? new server(io_svc) : 0);
        std::auto_ptr<client> c(is_server ? 0 : new client(io_svc));
        io_svc.run();
    } catch (std::exception& e) {
        std::cerr << "Exception: " << e.what() << "\n";
    }
    return 0;
}

So my question is basically:

How do I get rid of these stalls?

What causes this to happen?

Update: There appears to be some correlation with disk activity contrary to what I stated above, so it appears that if I start a large directory copy on the disk while the test is running this might increase the frequency of the io stalls. This could indicate that this is the Windows IO Prioritization that kicks in? Since the pauses are always the same length, that does sound somewhat like a timeout somewhere in the OS io code...

Grandiose answered 10/2, 2011 at 11:25 Comment(4)

I can't find it now, but when I was looking through the documentation before on ASIO there was some type of warning about using IOCP with ASIO. – Worthwhile 10/2, 2011 at 17:20

@diverscuba23: If there were any problems with using IOCP with asio, it probably would not be the default method to use. There is a comment in the source indicating that there has previously been a problem with GetQueuedCompletionStatus not getting any completion packets if no timeout is used (on some versions of Windows), maybe this is what you mean? – Grandiose 11/2, 2011 at 0:12

that may be. As I said I could not find it when I went back to look for it, as it was just a link at the bottom of the page. I never bothered to follow the link, as it was for windows and I do my development for *NIX systems. – Worthwhile 11/2, 2011 at 15:25

I'd suggest running on another platform like Linux to see if it exhibits similar behavior. – Divvy 11/2, 2011 at 15:56

adjust boost::asio::socket_base::send_buffer_size and receive_buffer_size
adjust max_length to a larger number. Since TCP is stream oriented, don't think of it as receiving single packets. This is most likely causing some sort of "gridlock" between TCP send/receive windows.

Isobel answered 10/2, 2011 at 16:33 Comment(12)

Just blindly adjusting SO_RCVBUF and SO_SNDBUF does not seem to be really solving the problem, or did you have something in particular in mind? Also, the receiver in this example has no notion of packets at all, it just throws away whatever data of whatever size it receives. Please explain what you mean. – Grandiose 11/2, 2011 at 0:2

I was thinking that the send and receive windows were getting filled, causing the momentary gaps. When you do the async_read_some, you're reading 1024 bytes every call. Did you try increasing this to match the 8000 byte size you're sending? – Isobel 11/2, 2011 at 14:22

@PiNoYBoY82, yes I've tested with a readbuffer that is multiples of the per write call size and as indicated in the question this almost makes the problem go away. But as you yourself pointed out, there isn't really a 1-to-1 correspondence between the write size on one end and read size on the other. Also I think you are mixing up your vocabulary, there is a difference between the TCP receive window size and the socket send/read buffer size as well as the read buffer size that is specified in the read call. – Grandiose 11/2, 2011 at 15:28

@PiNoYBoY82, During normal operations the sender completes around 30k writes per second translating to around 230k reads per second. I believe the default buffer size is 8k (I might be wrong on that though). So in case the receive window or the send/receive buffers should happen to be filled, they should be emptied really quickly. Certainly a lot quicker than five seconds. – Grandiose 11/2, 2011 at 15:32

hmm Have you tried throttling your async_writes to see if the problem still exhibits itself? – Isobel 11/2, 2011 at 20:22

Possible there is an ec in handle_read that could lead somewhere? – Isobel 11/2, 2011 at 20:34

@PiNoYBoY82, I'm not sure that I follow - if throttling the writes would affect the behavior, which it very well might, what information would that give me? Also, if there was an error in the read handler, the connection is destroyed, i.e no more writes. – Grandiose 11/2, 2011 at 21:42

I think it indicates some sort of overrun that could require delving into the IOCP code in boost. I'd be interested in the ratio of async_write vs. on_write calls, or outstanding calls vs. completed calls. Since each async_write is an outstanding call on IOCP, I imagine there is a certain point of queued up outstanding calls that IOCP can handle, which may lead to this behavior in the way boost handles the situation. – Isobel 11/2, 2011 at 22:7

I would also check Tcp1323Opts in the registry technet.microsoft.com/en-us/library/cc938205.aspx – Isobel 11/2, 2011 at 22:15

@PiNoYBoY82, There is a one-to-one mapping of async_writes to callbacks. You cannot have multiple outstanding writes on a socket. If you look at the code in the question, you can see that each write will only be performed in the callback function when the previous write has completed. I'm not sure if you've used asynchronous io with tcp before, but since tcp is stream oriented all writes have an inherent ordering so you simply cannot have multiple writes ongoing on the same socket. Please see boost.org/doc/libs/1_45_0/doc/html/boost_asio/reference/… – Grandiose 11/2, 2011 at 22:17

Ah yes sorry, just recalling issues writing to IOCP. This is sounding like something external to your app interrupting IO. Perhaps something in the BIOS or another process? Sorry for the wild tangets while troubleshooting :) – Isobel 11/2, 2011 at 22:46

@PiNoYBoY82, yes it might very well the case that this is external to the application, I've update the question to indicate that I've seen a higher frequency of these io stalls when there is concurrent disk activity. Also, no worries about wild guesses, sometimes these do help and all help is appreciated. You should probably use comments rather than an answer for this though. – Grandiose 11/2, 2011 at 23:2

I recently encountered a very similar sounding problem, and have a solution that works for me. I have an asynchronous server/client written in asio that sends and receives video (and small request structures), and I was seeing frequent 5 second stalls just as you describe.

Our fix was to increase the size of the socket buffers on each end, and to disable the Nagle algorithm.

pSocket->set_option( boost::asio::ip::tcp::no_delay( true) );
pSocket->set_option( boost::asio::socket_base::send_buffer_size( s_SocketBufferSize ) );
pSocket->set_option( boost::asio::socket_base::receive_buffer_size( s_SocketBufferSize ) );

It might be that only one of the above options is critical, but I've not investigated this further.

Eliott answered 3/3, 2011 at 12:49 Comment(0)

Problem

CPU usage during an entire run:

Source and test description

Recommended topics

Hot tags