I'm using boost spsc_queue
to move my stuff from one thread to another. It's one of the critical places in my software so I want to do it as soon as possible. I wrote this test program:
#include <boost/lockfree/spsc_queue.hpp>
#include <stdint.h>
#include <condition_variable>
#include <thread>
const int N_TESTS = 1000;
int results[N_TESTS];
boost::lockfree::spsc_queue<int64_t, boost::lockfree::capacity<1024>> testQueue;
using std::chrono::nanoseconds;
using std::chrono::duration_cast;
int totalQueueNano(0);
int totalQueueCount(0);
void Consumer() {
int i = 0;
int64_t scheduledAt;
while (i < N_TESTS - 1) {
while (testQueue.pop(scheduledAt)) {
int64_t dequeuedAt = (duration_cast<nanoseconds>(
std::chrono::high_resolution_clock::now().time_since_epoch())).count();
auto diff = dequeuedAt - scheduledAt;
totalQueueNano += diff;
++totalQueueCount;
results[i] = diff;
++i;
}
}
for (int i = 0; i < N_TESTS; i++) {
printf("%d ", results[i]);
}
printf("\nspsc_queue latency average nano = %d\n", totalQueueNano / totalQueueCount);
}
int main() {
std::thread t(Consumer);
usleep(1000000);
for (int i = 0; i < N_TESTS; i++) {
usleep(1000);
int64_t scheduledAt = (duration_cast<nanoseconds>(
std::chrono::high_resolution_clock::now().time_since_epoch())).count();
testQueue.push(scheduledAt);
}
usleep(1000000);
return 0;
}
Compile flags:
g++ -std=c++0x -O3 -Wall -c -fmessage-length=0 -march=native -mtune=native -pthread -MMD -MP -MF"src/TestProject.d" -MT"src/TestProject.d" -o "src/TestProject.o" "../src/TestProject.cpp"
g++ -pthread -o "TestProject" ./src/TestProject.o -lpthread
On my machine: RHEL 7.1, gcc 4.8.3, Xeon E5-2690 v3 I receive 290-300 nanoseconds.
- How good my test application is? Am I correctly measure "spsc_queue" latency?
- What is current industry best time to pass data from one thread to another?
- Is it good choice to use boost spsc_queue to move data from one thread to another?
- Can you recommend something faster than spsc_queue?
- Can you write a code which do same work significantly faster?
upd: queue mechanism is required. if first thread produce data every 1000 nanoseconds, but second thread spents 10 000 nanoseconds to process single item I need to "queue" several items for a short period of time. But my "queue" is never "too big". fixed-size short ring-buffer must be enough.
upd2 So in short the question is - what is the fastest single producer single consumer queue (most likely based on fixed size ringbuffer)? I'm using boost spsc_queue and I achieve ~300 ns latency, can you suggest something faster?
upd3 in java world there is disruptor that achieve 50 ns latency https://code.google.com/p/disruptor/wiki/PerformanceResults Do we have something in c++ with the same 50 ns latency?
testQueue.push(i)
copies int to spsc_queue,testQueue.pop(i)
reads value, it seems it returns reference to internal storage – Destineepush
andpop
copy some bits to get the value into and out of the queue object, but what I meant to say was that you're not copying from any thread to any other thread. All the stuff just resides in the free store, accessible to both thread (in principle) as long as proper syncronization is used. – Carangid