Peterson's algorithm is an example of something that requires sequential consistency.
In the days before mutexes, the algoritm was used to give a single thread access to a protected area.
The algoritm only works with 2 threads, each managing a flag that expresses the intention to access the protected area.
If both set the flag at (about) the same time, both will backoff and try again.
The real algorithm is more advanced since it uses a 'turn' flag to manage fair access, but to show the difference between seq/cst and acq/rel this is not necessary.
Below is a ready-to-compile, simplified version of the Peterson algorithm that actually shows that the algoritms is broken if something weaker than sequential consistency is used.
Interestingly enough, this is broken even on X86 since that platform allows store-loads to be reordered.
The problem with store-load reordering is that both threads may express their intention to access the protected area by setting their me
flag to true
, while both are reading false
from the him
flag (since the value was not propagated yet to both threads) and enter the protected area. This is not possible with sequential consistency.
With gcc
, I had to compile with -O3
optimizations to have the assert
fire, with clang
that was not necessary.
Both compilers use a different approach to implement sequential consistency.
#include <thread>
#include <atomic>
#include <assert.h>
std::atomic<bool> flag1{false};
std::atomic<bool> flag2{false};
std::atomic<int> counter{0};
// Change these to memory_order_seq_cst to fix the algorithm
static const auto store_ordering = std::memory_order_release;
static const auto load_ordering = std::memory_order_acquire;
void busy(int n)
{
auto &me = (n==1) ? flag1 : flag2;
auto &him = (n==1) ? flag2 : flag1;
for (;;)
{
for (;;)
{
me.store(true, store_ordering);
if (him.load(load_ordering) == false)
{
// got the 'lock'
break;
}
// retention, no wait period -> busy loop
me.store(false, store_ordering);
}
int tmp = counter.fetch_add(1, std::memory_order_relaxed);
assert(tmp == 0);
/*
* critical area
*/
tmp = counter.fetch_sub(1, std::memory_order_relaxed);
assert(tmp == 1);
me.store(false, store_ordering);
}
}
int main()
{
std::thread t1{busy, 1};
std::thread t2{busy, 2};
t1.join();
t2.join();
}