I have an async API which wraps some IO library. The library uses C style callbacks, the API is C++, so natural choice (IMHO) was to use std::future/std::promise
to build this API. Something like std::future<void> Read(uint64_t addr, byte* buff, uint64_t buffSize)
. However, when I was testing the implementation I saw that the bottleneck is the future/promise
, more precisely, the futex
used to implement promise/future
. Since the futex, AFAIK, is user space and the fastest mechanism I know to sync two threads, I just switched to use raw futexes, which somewhat improved the situation, but not something drastic. The performance floating somewhere around 200k futex WAKEs per second. Then I stumbled upon this article - Futex Scaling for Multi-core Systems which quite matches the effect I observe with futexes. My questions is, since the futex too slow for me, what is the fastest mechanism on Linux I can use to wake the waiting side. I dont need anything more sophisticated than binary semaphore, just to signal IO operation completion. Since IO operations are very fast (tens of microseconds) switching to kernel mode not an option. Busy wait not an option too, since CPU time is precious in my case.
Bottom line, user space, simple synchronization primitive, shared between two threads only, only one thread sets the completion, only one thread waits for completion.
EDIT001: What if... Previously I said, no spinning in busy wait. But futex already spins in busy wait, right? But the implementation covers more general case, which requests global hash table, to hold the futexes, queues for all subscribers etc. Is it a good idea to mimic same behavior on some simple entity (like int), no locks, no atomics, no global datastructures and busy wait on it like futex already does?
qspinlock
sounds more suitable, however didnt find anything useful about it. – Zenobiazeolitelibuv
. An event loop should never block. Not on mutex, not on futex, not on IO. You can useuv_async_t
to share data between loops (i.e threads). Therefore, you're better off with standard C-style callbacks. – Groarkpromise/future
) andset_value
onpromise
, caller site withfuture
gets notified ifget
called, no blocking and everyone is happy, right? – Zenobiazeoliteboost::fibers
the synchronization is much cheaper. Coroutines will do too. However, in some cases (like in ours) it will require massive code refactoring – Zenobiazeolite