Until now I was using std::queue
in my project. I measured the average time which a specific operation on this queue requires.
The times were measured on 2 machines: My local Ubuntu VM and a remote server.
Using std::queue
, the average was almost the same on both machines: ~750 microseconds.
Then I "upgraded" the std::queue
to boost::lockfree::spsc_queue
, so I could get rid of the mutexes protecting the queue. On my local VM I could see a huge performance gain, the average is now on 200 microseconds. On the remote machine however, the average went up to 800 microseconds, which is slower than it was before.
First I thought this might be because the remote machine might not support the lock-free implementation:
From the Boost.Lockfree page:
Not all hardware supports the same set of atomic instructions. If it is not available in hardware, it can be emulated in software using guards. However this has the obvious drawback of losing the lock-free property.
To find out if these instructions are supported, boost::lockfree::queue
has a method called bool is_lock_free(void) const;
.
However, boost::lockfree::spsc_queue
does not have a function like this, which, for me, implies that it does not rely on the hardware and that is is always lockfree - on any machine.
What could be the reason for the performance loss?
Exmple code (Producer/Consumer)
// c++11 compiler and boost library required
#include <iostream>
#include <cstdlib>
#include <chrono>
#include <async>
#include <thread>
/* Using blocking queue:
* #include <mutex>
* #include <queue>
*/
#include <boost/lockfree/spsc_queue.hpp>
boost::lockfree::spsc_queue<int, boost::lockfree::capacity<1024>> queue;
/* Using blocking queue:
* std::queue<int> queue;
* std::mutex mutex;
*/
int main()
{
auto producer = std::async(std::launch::async, [queue /*,mutex*/]()
{
// Producing data in a random interval
while(true)
{
/* Using the blocking queue, the mutex must be locked here.
* mutex.lock();
*/
// Push random int (0-9999)
queue.push(std::rand() % 10000);
/* Using the blocking queue, the mutex must be unlocked here.
* mutex.unlock();
*/
// Sleep for random duration (0-999 microseconds)
std::this_thread::sleep_for(std::chrono::microseconds(rand() % 1000));
}
}
auto consumer = std::async(std::launch::async, [queue /*,mutex*/]()
{
// Example operation on the queue.
// Checks if 1234 was generated by the producer, returns if found.
while(true)
{
/* Using the blocking queue, the mutex must be locked here.
* mutex.lock();
*/
int value;
while(queue.pop(value)
{
if(value == 1234)
return;
}
/* Using the blocking queue, the mutex must be unlocked here.
* mutex.unlock();
*/
// Sleep for 100 microseconds
std::this_thread::sleep_for(std::chrono::microseconds(100));
}
}
consumer.get();
std::cout << "1234 was generated!" << std::endl;
return 0;
}
rand
should not be used in a multithreaded environment. Use C++11<random>
orrand_r
instead, see also. How do you time the code and how does it correspond to your performance observations? – Narrowsrand()
. Actually, the producer receives updates from a websocket server (that's what I was trying to model withrand()
). I will rewrite the example this evening. Do you think it would be better for a minimal example to include the "real" code, i.e. websocket connections etc., or should I try to model it like I did before? – Verminis_lock_free
implies anything is incorrect. In the implementation we see they all use atomic operations, which all directly depend on CPU support – Vladi