Can the thread that acquires spin-lock later happen first on the timeline?

Asked 17/10, 2024 at 13:41 Answered 17/10, 2024 at 15:3

Solved c++multithreading language-lawyer atomic

Consider this example:

#include <iostream>
#include <atomic>
#include <thread>

struct SpinLock{
   std::atomic<bool>  state;
   void lock(){
      bool expected = false;
      while(!state.compare_exchange_strong(expected,true,std::memory_order::acquire,std::memory_order::relaxed)){
        expected = false;
      }
   }
   void unlock(){
      state.store(false,std::memory_order::release);
   }
};
int main(){
    auto spin_lock = SpinLock{false};
    int i = 0;
    std::thread t1([&](){
        std::this_thread::sleep_for(std::chrono::seconds(1));
        spin_lock.lock();
        auto time_stamp = std::time(nullptr);
        std::cout<<"t1 " <<time_stamp <<"   "<< i<<"\n"; // #1
        spin_lock.unlock();
    });
    std::thread t2([&](){
        spin_lock.lock();
        i = 1;
        auto time_stamp = std::time(nullptr);
        std::cout<<"t2 " <<time_stamp<<"   "<< i<<"\n"; // #2
        spin_lock.unlock();
    });
    t1.join();
    t2.join();
}

From the perspective of C++ standard, Is it possible that #1 prints i==0 and a timestamp value representing a later time while #2 prints i==1 and a timestamp value representing an earlier time?

If the value of i #1 reads is 0, that is, the store operations to state in t1 is earlier in the modification order than that of t2(i.e. the CAS operation in t1 wins the race so it firstly acquires the lock), otherwise the read value must be 1 since the lock in t1 would synchronize with the unlock in t2. The modification order is irrelevant to the order in the timeline, IIUC, the outcome is

t1 1729172229 0
t2 1729172228 1

this outcome is unintuitive, however, from the perspective of C++ standard, is it a possible outcome?

Tricksy answered 17/10, 2024 at 13:41 Comment(7)

The problem is with function std::time it is an archaic function and might behave dubiously. – Sanguineous 17/10, 2024 at 14:38

@Sanguineous It's the POSIX time() function so it won't actually overflow until 2038 even on implementations where time_t is a 32-bit integer. (man7.org/linux/man-pages/man3/time.3p.html). It is a pretty coarse time, though, only 1-second resolution, bad for noticing reordering effects unless they happen to overlap a transition. – Bunni 17/10, 2024 at 14:56

I'd assume that ISO C++ allows this. There are real-world ISAs where reading a timestamp from the CPU only involves registers, not memory, so out-of-order exec can let it reorder without memory reordering. – Bunni 17/10, 2024 at 15:5

Unrelated note: compare_exchange_weak would be enough in that context where we have a loop anyway. – Goodly 17/10, 2024 at 15:5

@PeterCordes: But it doesn't, at least not for steady_clock or any other clock that is steady. So if the ISA can implement steady_clock by reading directly from system registers, etc, then it needs to insert appropriate synchronization barriers to ensure correctness. – Showdown 17/10, 2024 at 15:10

@NateEldredge: I meant to write that clock reads like rdtsc can reorder with memory operations even if the memory operations have acquire or release, in ways that a load couldn't. Such a clock could still be steady in practice, although the OP didn't specify that requirement. On x86 for example, rdtsc and rdtscp have no inputs so their clock-reading uops can execute as soon as there's a free execution port. All mainstream CPUs schedule uops oldest-ready first, so if there's only one port that can run the actual clock-read uop, the older one will exec first. – Bunni 17/10, 2024 at 17:3

@NateEldredge: Oh, just saw your answer that not only is steady_clock monotonic, it respects happens-before across threads. – Bunni 17/10, 2024 at 17:10

The C++ standard doesn't forbid this for std::time. In fact it doesn't say much of anything about how std::time behaves, but defers to the C standard, which doesn't address this issue; so we have to presume it is allowed.

However, it does forbid it for std::chrono::steady_clock or any other clock type C for which C::is_steady == true. See time.clock.req p2:

In Table 103 C1 and C2 denote clock types. t1 and t2 are values returned by C1::now() where the call returning t1 happens before ([intro.multithread]) the call returning t2 and both of these calls occur before C1::time_point::max(). [...]

[In Table 103] C1::is_steady | const bool | true if t1 <= t2 is always true and the time between clock ticks is constant, otherwise false.

Your spin lock has proper acquire-release ordering, such that unlock synchronizes with lock, and one critical section happens before the other. So if you replaced std::time() with std::chrono::steady_clock::now(), then if t1 observes i == 0, we can conclude that the critical section in t1 happens before the one in t2, and thus the time printed in t1 must be no greater than the time printed in t2. (They could of course be equal.)

Showdown answered 17/10, 2024 at 15:3 Comment(5)

Since std::time has an underspecified semantic, look at this example, I make a bit of change, is it possible that the outcome is t1 1 0 and t2 0 1? – Tricksy 18/10, 2024 at 1:21

Re-check the above example, there is some wrong here, in this example the output of t1 cannot be t1 1 0 since reading i as 0 means that the unlock in t1 happens before the lock in t2, hence the load operation of time happens before the store operation of time in t2, so it cannot be 1, IIUC. The asked example should be this. – Tricksy 18/10, 2024 at 14:53

Your new example has nothing to do with time, and doesn't query a real-time clock at any point. You just have an atomic<int> counter, which you're converting to a time_point but that doesn't mean anything. And the counter increment is outside the locked critical section, so there's clearly no connection between how the increments are ordered, and how the critical sections are ordered. – Showdown 19/10, 2024 at 1:56

@xmh0511: You can get t1 1 0 and t2 0 1 just by interleaving the operations, i.e. even assuming sequential consistency. Suppose t2 runs first and does its fetch_add (loading 0). It then gets scheduled out for a while. t1 then gets scheduled in and runs to completion (loading 1 from its fetch_add and 0 from i). Finally t2 gets scheduled again and completes the locked section, printing the value 1 for i (it can't print anything else). – Showdown 19/10, 2024 at 2:1

@Tricksy And you can make that happen reliably if you just insert a delay in t2 after the fetch_add. godbolt.org/z/bo4TGfGva – Showdown 19/10, 2024 at 2:2

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags