You'll need a non-x86 CPU, typically one that uses load-linked / store-conditional like ARM or AArch64 (before ARMv8.1). x86 lock cmpxchg
implements CAS_strong, and there's nothing you can do to weaken it and make it possibly fail when the value in memory does match.
CAS_weak can compile to a single LL/SC attempt; CAS_strong has to compile to an LL/SC retry loop to only fail if the compare was false, not merely from competition for the cache line from other threads. (As on Godbolt, for ARMv8 ldxr
/stxr
vs. ARMv8.1 cas
)
Have one thread use CAS_weak to do 100 million increments of a std::atomic variable. (https://preshing.com/20150402/you-can-do-any-kind-of-atomic-read-modify-write-operation/ explains how to use a CAS retry loop to implement any arbitrary atomic RMW operation on one variable, but since this is the only thread writing, CAS_strong will always succeed and we don't need a retry loop. My Godbolt example shows doing a single increment attempt using CAS_weak vs. CAS_strong.)
To cause spurious failures for CAS_weak but not CAS_strong, have another thread reading the same variable, or writing something else in the same cache line.
Or maybe doing something like var.fetch_or(0, std::memory_order_relaxed)
to truly get ownership of the cache line, but in a way that wouldn't affect the value. That will definitely cause spurious LL/SC failures.
When you're done, 100M CAS_strong attempts will have incremented the variable from 0 to 100000000. But 100M CAS_weak attempts will have incremented to some lower value, with the difference being the number of failures.
false
once out of every 1000 calls, or something like that. – Egmont