I have two versions of spinlock below. The first uses the default which is memory_order_cst while the latter uses memory_order_acquire/memory_order_release. Since the latter is more relaxed, I expect it to have better performance. However it doesn't seem to be case.
class SimpleSpinLock
{
public:
inline SimpleSpinLock(): mFlag(ATOMIC_FLAG_INIT) {}
inline void lock()
{
int backoff = 0;
while (mFlag.test_and_set()) { DoWaitBackoff(backoff); }
}
inline void unlock()
{
mFlag.clear();
}
private:
std::atomic_flag mFlag = ATOMIC_FLAG_INIT;
};
class SimpleSpinLock2
{
public:
inline SimpleSpinLock2(): mFlag(ATOMIC_FLAG_INIT) {}
inline void lock()
{
int backoff = 0;
while (mFlag.test_and_set(std::memory_order_acquire)) { DoWaitBackoff(backoff); }
}
inline void unlock()
{
mFlag.clear(std::memory_order_release);
}
private:
std::atomic_flag mFlag = ATOMIC_FLAG_INIT;
};
const int NUM_THREADS = 8;
const int NUM_ITERS = 5000000;
const int EXPECTED_VAL = NUM_THREADS * NUM_ITERS;
int val = 0;
long j = 0;
SimpleSpinLock spinLock;
void ThreadBody()
{
for (int i = 0; i < NUM_ITERS; ++i)
{
spinLock.lock();
++val;
j = i * 3.5 + val;
spinLock.unlock();
}
}
int main()
{
vector<thread> threads;
for (int i = 0; i < NUM_THREADS; ++i)
{
cout << "Creating thread " << i << endl;
threads.push_back(std::move(std::thread(ThreadBody)));
}
for (thread& thr: threads)
{
thr.join();
}
cout << "Final value: " << val << "\t" << j << endl;
assert(val == EXPECTED_VAL);
return 1;
}
I am running on Ubuntu 12.04 with gcc 4.8.2 running optimization O3.
-- Spinlock with memory_order_cst:
Run 1:
real 0m1.588s
user 0m4.548s
sys 0m0.052s
Run 2:
real 0m1.577s
user 0m4.580s
sys 0m0.032s
Run 3:
real 0m1.560s
user 0m4.436s
sys 0m0.032s
-- Spinlock with memory_order_acquire/release:
Run 1:
real 0m1.797s
user 0m4.608s
sys 0m0.100s
Run 2:
real 0m1.853s
user 0m4.692s
sys 0m0.164s
Run 3:
real 0m1.784s
user 0m4.552s
sys 0m0.124s
Run 4:
real 0m1.475s
user 0m3.596s
sys 0m0.120s
With the more relaxed model, I see a lot more variability. Sometimes it's better. Often times it is worse, does anyone have an explanation for this?
ThreadBody
? – Duley