Why is atomic.StoreUint32 preferred over a normal assignment in sync.Once?
Asked Answered
Y

4

10

While reading the source codes of Go, I have a question about the code in src/sync/once.go:

func (o *Once) Do(f func()) {
    // Note: Here is an incorrect implementation of Do:
    //
    //  if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
    //      f()
    //  }
    //
    // Do guarantees that when it returns, f has finished.
    // This implementation would not implement that guarantee:
    // given two simultaneous calls, the winner of the cas would
    // call f, and the second would return immediately, without
    // waiting for the first's call to f to complete.
    // This is why the slow path falls back to a mutex, and why
    // the atomic.StoreUint32 must be delayed until after f returns.

    if atomic.LoadUint32(&o.done) == 0 {
        // Outlined slow-path to allow inlining of the fast-path.
        o.doSlow(f)
    }
}

func (o *Once) doSlow(f func()) {
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {
        defer atomic.StoreUint32(&o.done, 1)
        f()
    }
}

Why is atomic.StoreUint32 used, rather than, say o.done = 1? Are these not equivalent? What are the differences?

Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of f() before o.done is set to 1 on a machine with weak memory model?

Yvor answered 28/1, 2021 at 10:18 Comment(9)
My intuition is that a non-atomic write would not necessarily be visible to the LoadUint32. Although the write is done under a lock, the read isn't. There's an open bug filed against the go memory model to detail cases like these, so it's hard to be definitive as to whether this is right or not.Delicacy
@PaulHankin, Thanks Paul, I am really puzzling about the use of "atomic.StoreUint32" here, although pair using of atomic operations is good practice in programming, but on machines with strong memory model, it seems that a simple assignment is enough.Yvor
@Yvor it's a mistake to think that way. Good code is correct according to the language specification and not whether it happens to work on a particular machine. Because in principle (although this is not typically the "go way"), the compiler can make optimizations that break programs that are contrary to the language specification.Delicacy
A very similar question has been asked before (but without answer).Percaline
I figured the previous question went without answer because it was obvious once the OP's misconceptions were clarified in the comments. We can put the question to rest here and close the other one.Brightness
Go values compatibility very highly. Just because a regular assignment happens to work on your current machine on your current Go version, doesn't mean we can put that code into the world in good conscience and have people rely on it for critical applicationsWally
@HymnsForDisco: regular assignment does not work on any system with any Go version, because it violates the memory model. If the code is executed concurrently, the race detector will show a race between atomic.LoadUint32 and o.done = 1Brightness
@kingwah001: note that machines that don't have a strong memory model exist. Here, a plain memory read or write might just use the CPU-side cache, for instance: you must issue a special instruction (load-locked and store-conditional, for instance, or memory barrier or cache flush instructions) to have the CPU actually consult any shared memory where some other CPU may also be reading and/or writing. PowerPC and SPARC (V9) use these kinds of operations, for instance.Kedron
@torek, thanks. One possible scene in weak memory model is that: 1) suppose a line of code a:=1 in the body of f(); 2) after the o.done=1 is executed by goroutine A, another goroutine B observed that o.done is 1 by using atomic.LoadUint32(&o.done), but B still can not observe that a is 1 yet, because a normal assignment o.done=1 can not guarantee that caches in other cpus would be flushed before o.m.Unlock() is executed. atomic.StoreUint32(&o.done, 1) can make sure a:=1 is coherent in all cpus' caches before o.done is 1.Yvor
B
5

Remember, unless you are writing the assembly by hand, you are not programming to your machine's memory model, you are programming to Go's memory model. This means that even if primitive assignments are atomic with your architecture, Go requires the use of the atomic package to ensure correctness across all supported architectures.

Access to the done flag outside of the mutex only needs to be safe, not strictly ordered, so atomic operations can be used instead of always obtaining a lock with a mutex. This is an optimization to make the fast path as efficient as possible, allowing sync.Once to be used in hot paths.

The mutex used for doSlow is for mutual exclusion within that function alone, to ensure that only one caller ever makes it to f() before the done flag is set. The flag is written using atomic.StoreUint32, because it may happen concurrently with atomic.LoadUint32 outside of the critical section protected by the mutex.

Reading the done field concurrently with writes, even atomic writes, is a data race. Just because the field is read atomically, does not mean you can use normal assignment to write it, hence the flag is checked first with atomic.LoadUint32 and written with atomic.StoreUint32

The direct read of done within doSlow is safe, because it is protected from concurrent writes by the mutex. Reading the value concurrently with atomic.LoadUint32 is safe because both are read operations.

Brightness answered 28/1, 2021 at 11:34 Comment(10)
Execuse me. Do you mean if access to the done outside of mutex derectly may be not safe on some architectures? @BrightnessUnicuspid
@JasonPan, reading done directly outside of the mutex would be a data race (though the definition of what the result is has been further refined in the new memory model limiting how "unsafe" it would be). But the done value is never directly read outside of the mutex without atomic, which is why I didn't word it that way.Brightness
Could you give us some reference about the data race in go memory model and how the new memory model limiting the "unsafe". What's the specific bad influence if the LoadUint32 changed to if o.done == 0 and StoreUint32 changed to o.done = 1?Unicuspid
Oh. Suddenly I got your point. Do you mean if two goroutine run concurrently, a specific variable, like o.done, changed in one goroutine, the other goroutine may see an old value of it? So that if o.done is set to 1 by common assignment, another goroutine still will run into the doSlow and get the lock, what serious is that the o.done is still 0?Unicuspid
And the memory model is described here go.dev/ref/memUnicuspid
I think by "atomic" you mean "sequentially consistent", since plain accesses on uint32 provide atomicity already.Daugavpils
@QuânAnhMai, this was written before the memory model added the assurance that native word-size access must not cause broken reads or writes. However the spec still refers to "atomic" operations using the atomic package, and it's still required for a correct program and visibility of the value, so I think the use of the term here is still warranted.Brightness
@QuânAnhMai, also note that while the memory model has defined the possible results of a data race on word-sized values, it does not imply that the results of data races are valid. You still must use atomic functions to get effectively atomic operations, as any implementation is allowed to summarily halt execution upon encountering a data race without synchronized access.Brightness
IDK what do you mean by "it does not imply that the results of data races are valid", if the operation produces a result, then it must be valid, there is no invalid result because the opposite of a valid result here is no result at all. Furthermore, while it is perfectly legal for a Go program to "report the race and halt execution of the program", personally I would argue that a sane implementation would not halt unless explicitly told to do so (eg via -race flag).Daugavpils
@QuânAnhMai, I mean "invalid" in that the runtime is allowed to detect races and abort execution. Whether you think that is sane or not is beside the point, it is allowed to do so. Regardless, there's no reason to change the semantics here just because additional safeguards on data races have been defined, if the atomic functions (which are basically intrinsics) are not actually needed for the underlying architecture, they don't compile to any additional instructions.Brightness
R
1

Must we use the atomic operation (atomic.StoreUint32) to make sure that other goroutines can observe the effect of f() before o.done is set to 1 on a machine with weak memory model?

Yes you are in the right direction of thought, but please note that even if the targeting machine has a strong memory model, the Go compiler can and will reorder instructions as long as the result adheres to the Go memory model. In contrast, even if the machine memory model is weaker than the language one, the compiler has to emit additional barriers so that the final code behaves compliantly with the language specification.

Let's consider the implementation of sync.Once without sync/atomic, with modifications for easier explaining:

func (o *Once) Do(f func()) {
    if o.done == 0 { // (1)
        o.m.Lock() // (2)
        defer o.m.Unlock() // (3)
        if o.done == 0 { // (4)
            f() // (5)
            o.done = 1 // (6)
        }
    }
}

If a goroutine observes that o.done != 0, it will return, as a result, the function must ensure that f() happens before any read can observe a 1 from o.done.

  • If the read is at (4), then it is protected by the mutex, which means that it will surely happen after the previous acquisition of the mutex which executes f and set o.done to 1.
  • If the read is at (1), we don't have the protection of the mutex, so we must construct a synchronise-with relationship between the write (6) at the writing goroutine to the read (1) at the current goroutine, after that, since (5) is sequenced before (6), a read with value 1 from (1) will surely happen after the execution of (5) according to the transitivity of happen-before relationship.

As a result, the write (6) must have release semantics, as well as the read (1) having acquire semantics. Since Go does not support acquire-read and release-store, we must resort to the stronger order, which is sequential consistency, provided by atomic.(Load/Store)Uint32.

Final note: since accesses to memory locations not larger than a machine word are guaranteed to be atomic, this usage of atomic here has nothing to do with atomicity and everything to do with synchronisation.

Rudderhead answered 10/12, 2022 at 16:36 Comment(0)
E
0
func (o *Once) Do(f func()) {
    if atomic.LoadUint32(&o.done) == 0 {       # 1
        // Outlined slow-path to allow inlining of the fast-path.
        o.doSlow(f)
    }
}

func (o *Once) doSlow(f func()) {
    o.m.Lock()
    defer o.m.Unlock()
    if o.done == 0 {                            # 2
        defer atomic.StoreUint32(&o.done, 1)    # 3
        f()
    }
}
  • #1 and #3 : #1 is read, #3 is write, it's not safe, need mutext to protect
  • #2 and #3 : in critical section, procted by mutex, safe.
Endurance answered 21/3, 2022 at 14:29 Comment(0)
H
-1

Atomic operations can be used to synchronize the execution of different goroutines.

Without synchronization, even if a goroutine observes o.done == 1, there is no guarantee that it will observe the effect of f().

Hijacker answered 14/7, 2022 at 9:24 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Velvety

© 2022 - 2024 — McMap. All rights reserved.