I've developed a mini HTTP server in C++, using boost::asio, and now I'm load testing it with multiple clients and I've been unable to get close to saturating the CPU. I'm testing on a Amazon EC2 instance, and getting about 50% usage of one cpu, 20% of another, and the remaining two are idle (according to htop).
Details:
- The server fires up one thread per core
- Requests are received, parsed, processed, and responses are written out
- The requests are for data, which is read out of memory (read-only for this test)
- I'm 'loading' the server using two machines, each running a java application, running 25 threads, sending requests
- I'm seeing about 230 requests/sec throughput (this is application requests, which are composed of many HTTP requests)
So, what should I look at to improve this result? Given the CPU is mostly idle, I'd like to leverage that additional capacity to get a higher throughput, say 800 requests/sec or whatever.
Ideas I've had:
- The requests are very small, and often fulfilled in a few ms, I could modify the client to send/compose bigger requests (perhaps using batching)
- I could modify the HTTP server to use the Select design pattern, is this appropriate here?
- I could do some profiling to try to understand what the bottleneck's are/is