Can I declare dispatch_once_t predicate as a member variable instead of static?
Asked Answered
B

4

24

I want to run a block of code only once per instance.

Can I declare dispatch_once_t predicate as a member variable instead of static variable?

From GCD Reference, it is not clear to me.

The predicate must point to a variable stored in global or static scope. The result of using a predicate with automatic or dynamic storage is undefined.

I know I can use dispatch_semaphore_t and a boolean flag to do the same thing. I'm just curious.

Bangup answered 13/12, 2012 at 8:45 Comment(1)
just curious: could you cite me a sample in code how that can be achieved using dispatch_semaphore_t and a BOOL flag to do the same Thing? Thanks in advance.Zielsdorf
D
61

dispatch_once_t must not be an instance variable.

The implementation of dispatch_once() requires that the dispatch_once_t is zero, and has never been non-zero. The previously-not-zero case would need additional memory barriers to work correctly, but dispatch_once() omits those barriers for performance reasons.

Instance variables are initialized to zero, but their memory may have previously stored another value. This makes them unsafe for dispatch_once() use.

Documentary answered 7/11, 2013 at 19:43 Comment(5)
Could one work around these limitations by adding an OSMemoryBarrier() at the end of the object's init method?Reducer
Only pointer instance variables (and probably only object pointer instance variables), and even then only under ARC, are guaranteed to be initialized to zero. dispatch_once_t is a synonym for long, and hence is not guaranteed to be initialized to zero.Drillstock
Not true. All Objective-C instance variables are guaranteed to be zero (except for ivars of C++ types which may be initialized using their zero-arg in-place constructor).Documentary
ARC adds zero initialization of local variables of ARC-managed object pointer types.Documentary
In any case, none of the above initializations are sufficient for dispatch_once use, because what is now the variable's storage may have been non-zero earlier in the process's life.Documentary
A
20

Update November 16

This question was originally answered back in 2012 with an "amusement", it didn't claim to provide a definitive answer and carried a caveat to that effect. In hindsight such an amusement should probably have stayed private, though some enjoyed it.

In August 2016 this Q&A was brought to my attention and I provided a proper answer. In that wrote:

I am going to seemingly disagree with Greg Parker, but probably not really...

Well it does seem that Greg and I disagree over whether we disagree, or the answer, or something ;-) So I am updating my Aug 2016 answer with a more detailed basis for the answer, why it might be wrong, and if so how to fix it (so the answer to the original question is still "yes"). Hopefully Greg & I will either agree, or I will learn something - either outcome is good!

So first the Aug 16 answer as it was, then an explanation of the basis for the answer. The original amusement has been removed to avoid any confusion, students of history can view the edit trail.


Answer: Aug 2016

I am going to seemingly disagree with Greg Parker, but probably not really...

The original question:

Can I declare dispatch_once_t predicate as a member variable instead of static variable?

Short Answer: The answer is yes PROVIDED there is a memory barrier between the initial creation of the object and any use of dispatch_once.

Quick Explanation: The requirement on the dispatch_once_t variable for dispatch_once is that it must be initially zero. The difficult comes from memory re-ordering operations on modern multiprocessors. While it may appear that a store to a location has been performed according to the program text (high level language or assembler level) the actual store may be reordered and occur after a subsequent read of the same location. To address this memory barriers can be used which force all memory operations occurring before them to complete before those following them. Apple provides the OSMemoryBarrier() to do this.

With dispatch_once Apple is stating that zero-initialised global variables are guaranteed to be zero, but that the zero-initialised instance variables (and zero initialising is the Objective-C default here) are not guaranteed to be zero before a dispatch_once is executed.

The solution is to insert a memory barrier; on the assumption that the dispatch_once occurs in some member method of an instance the obvious place to put this memory barrier is in the init method as (1) it will only be executed once (per instance) and (2) init must have returned before any other member method can be called.

So yes, with an appropriate memory barrier, dispatch_once can be used with an instance variable.


Nov 2016

Preamble: Notes on dispatch_once

These notes are based on Apple's code and comments for dispatch_once.

Usage of dispatch_once follows the standard pattern:

id cachedValue;
dispatch_once_t predicate = 0;
...
dispatch_once(&predicate, ^{ cachedValue = expensiveComputation(); });
... use cachedValue ...

and the last two lines are expanded inline (dispatch_once is a macro) to something like:

if (predicate != ~0) // (all 1's, indicates the block has been executed)  [A]
{
    dispatch_once_internal(&predicate, block);                         // [B]
}
... use cachedValue ...                                                // [C]

Notes:

  • Apple's source states that predicate must be initialised to zero and notes that global and static variables default to zero initialisation.

  • Note that at line [A] there is no memory barrier. On a processor with speculative read-ahead and branch predication the read of cachedValue in line [C] could occur before the read of predicate in line [A], which could lead to wrong results (a wrong value for cachedValue)

  • A barrier could be used to prevent this, however that is slow and Apple want this to be fast in the common case that the once block has already been performed, so...

  • dispatch_once_internal, line [B], which does use barriers and atomic operations internally, uses a special barrier, dispatch_atomic_maximally_synchronizing_barrier() to defeat the speculative read-ahead and so allow line [A] to be barrier free and hence fast.

  • Any processor reaching line [A] before dispatch_once_internal() has been executed and mutated predicate needs to read 0 from predicate. Using a global or static initialised to zero for predicate will guarantee this.

The important take away for our current purposes is that dispatch_once_internal mutates predicate in such a way that line [A] works without any barrier.

Long Explanation of Aug 16 Answer:

So we know using a global or static initialised to zero meets the requirements of dispatch_once()'s barrier-free fastpath. We also know that mutations made by dispatch_once_internal() to predicate are correctly handled.

What we need to determine is whether we can use an instance variable for predicate and initialise it in such a way that line [A] above can never read its pre-initialised value - as if it could things would break.

My Aug 16 answer says this is possible. To understand the basis for this we need to consider the program and data flow in a multi-processor environment with speculative read-ahead.

The outline of the Aug 16 answer's execution and data flow is:

Processor 1                              Processor 2
0. Call alloc
1. Zero instance var used for predicate
2. Return object ref from alloc
3. Call init passing object ref
4. Perform barrier
5. Return object ref from init
6. Store or send object ref somewhere
                           ...
                                         7. Obtain object ref
                                         8. Call instance method passing obj ref
                                         9. In called instance method dispatch_once
                                            tests predicate, This read is dependent
                                            on passed obj ref.

To be able to use an instance variable as the predicate then it must be impossible to execute step 9 in such a way that it reads the value in the memory before step 1 has zeroed it.

If step 4 is omitted, i.e. no appropriate barrier is inserted in init then, though the Processor 2 must obtain the correct value for the object reference generated by Processor 1 before it can execute step 9, it is (theoretically) possible that Processor 1's zero writes in step 1 have not yet been performed/written to global memory and Processor 2 will not see them.

So we insert step 4 and perform a barrier.

However we now have to consider speculative read-ahead, just as dispatch_once() has to. Could Processor 2 perform the read of step 9 before the barrier of step 4 has ensured the memory is zero?

Consider:

  • Processor 2 cannot perform, speculatively or otherwise, the read of step 9 until it has the object reference obtained in step 7 - and to do so speculatively requires the processor to determine that the method call in step 8, whose destination in Objective-C is dynamically determined, will end up at method containing step 9, which is quite advanced (but not impossible) speculation;

  • Step 7 cannot obtain the object reference until step 6 has stored/passed it;

  • Step 6 hasn't got it to store/pass until step 5 has returned it; and

  • Step 5 is after the barrier at step 4...

TL;DR: How can step 9 have the object reference required to perform the read until after step 4 containing the barrier? (And given the long execution path, with multiple branches, some conditional (e.g. inside method dispatch), is speculative read-ahead an issue at all?)

So I argue that the barrier at step 4 is sufficient, even in the presence of speculative read-ahead effecting step 9.

Consideration of Greg's comments:

Greg strengthened Apple's source code comment regarding the predicate from "must be initialised to zero" to "must never have been non-zero", which means since load time, and this is only true for global and static variables initialised to zero. The argument is based on defeating speculative read-ahead by modern processors required for the barrier-free dispatch_once() fast path.

Instance variables are initialised to zero at object creation time, and the memory they occupy could have been non-zero before then. However as has been argued above a suitable barrier can be used to ensure that dispatch_once() does not read a pre-initialisation value. I think Greg disagrees with my argument, if I follow his comments correctly, and argues that the barrier at step 4 is insufficient to handle speculative read-ahead.

Let's assume Greg is right (which is not at all improbable!), then we are in a situation Apple has already dealt with in dispatch_once(), we need to defeat the read-ahead. Apple does that by using the dispatch_atomic_maximally_synchronizing_barrier() barrier. We can use this same barrier at step 4 and prevent the following code from executing until all possible speculative-read ahead by Processor 2 has been defeated; and as the following code, steps 5 & 6, must execute before Processor 2 even has an object reference it can use to speculatively perform step 9 everything works.

So if I understand Greg's concerns then using dispatch_atomic_maximally_synchronizing_barrier() will address them, and using it instead of a standard barrier will not cause an issue even if it not actually required. So though I'm not convinced it is necessary it is at worst harmless to do so. My conclusion therefore remains as before (emphasis added):

So yes, with an appropriate memory barrier, dispatch_once can be used with an instance variable.

I'm sure Greg or some other reader will let me know if I have erred in my logic. I stand ready to facepalm!

Of course you have to decide whether the cost of the appropriate barrier in init is worth the benefit you gain from using dispatch_once() to obtain once-per-instance behaviour or whether you should address your requirements another way – and such alternatives are outside the scope of this answer!

Code for dispatch_atomic_maximally_synchronizing_barrier():

A definition of dispatch_atomic_maximally_synchronizing_barrier(), adapted from Apple's source, that you can use in your own code is:

#if defined(__x86_64__) || defined(__i386__)
   #define dispatch_atomic_maximally_synchronizing_barrier() \
      ({ unsigned long _clbr; __asm__ __volatile__( "cpuid" : "=a" (_clbr) : "0" (0) : "ebx", "ecx", "edx", "cc", "memory"); })
#else
   #define dispatch_atomic_maximally_synchronizing_barrier() \
      ({ __c11_atomic_thread_fence(dispatch_atomic_memory_order_seq_cst); })
#endif

If you want to know how this works read Apple's source code.

Ammunition answered 13/12, 2012 at 11:12 Comment(29)
Thank you for the tests. I vote up your answer. It seems that I have to ask in Apple Forums for sure answer.Bangup
@Bangup - the comment on the definition of dispatch_once_t is: /* @typedef dispatch_once_t @abstract A predicate for use with dispatch_once(). It must be initialized to zero. Note: static and global variables default to zero. */; which pretty much answers the question. I maybe should have looked up that comment earlier, but figuring out was fun...Ammunition
Some might think it's more interesting to look at the implementation of dispatch_once: trunk/src/once.cWilkey
@robmayoff - I hadn't looked at whether Apple had released the source, good call. I could suggest looking at the source is "cheating" ;-), but a specification is often better than a particular implementation of that specification. In this case is appears the implementation doesn't state any initialisation or other requirement on the predicate, all we have is what is in the header.Ammunition
This is incorrect. The requirement is not that the dispatch_once_t is initially zero, but that the dispatch_once_t has never been non-zero. Breaking that assumption may work most of the time, but if you are unlucky then the block may execute more than once or not at all.Documentary
@GregParker - Interesting. So the statement in the header ("It must be initialized to zero.") is inaccurate/imprecise? Given that the memory location will have been non-zero at some point in its history (its not brand new RAM), when does time start? I could venture an answer, but you will know :-) Just curious.Ammunition
The issue is that there isn't a deterministic ordering between the write of the zero, and the dispatch_once.Volt
@Volt - The issue is whether there is a deterministic ordering between memory writes in alloc/init and reads in other methods; also whether any such ordering is dependent on types (cf. ARC) , see discussion on this question.Ammunition
Re your recent edits: You write "The answer is yes PROVIDED there is a memory barrier between the initial creation of the object and any use of dispatch_once". How does that compare to what Greg writes: "The implementation of dispatch_once() requires that the dispatch_once_t is zero, and has never been non-zero. ... dispatch_once() omits those barriers for performance reasons" ?Twopenny
@MartinR - Greg answered "no" because there is no memory barrier, I answered a conditional "yes" because you can add the barrier. Greg's "never been non-zero" is the condition which must hold if there is no barrier.Ammunition
Adding a barrier in -init is not sufficient. On some architectures you also need a barrier on the read side of dispatch_once(), and it doesn't have it.Documentary
@GregParker - I'm intrigued. For comparison let's consider (a) loading a dynamic library into a running app vs. (b) object init. (a) memory used was previously used by some process, at some time - no memory has never been non-zero. The global in the lib is zero after loading (any barriers involved?), code in the lib calls dispatch_once, no read barrier required... (b) alloc returns zeroed mem, init has a barrier, subsequent code calls dispatch_once, read barrier may be required? What doesn't happen in (b) to maybe require this?Ammunition
The particular requirement referred to here is that the memory in question has never been non-zero during the lifetime of the current process. This works for globals/statics because they're mapped directly as pre-filled pages by the kernel per their presence in a Mach-O binary; no processor has ever executed instructions in a process' set of active page tables at the time the data pages are loaded, therefore the memory has "never" been non-zero. (Disclaimer: This is my understanding. There are more details; @Volt or GregParker could (and probably will :) say if I missed something)Effectuate
Oh, and (again, according to my understanding), a barrier may be required on the read side because the attempt to read can take place between the store to the predicate and the issuing of the barrier in -init. If you assume "arbitrary" reordering of reads and writes, you must also require barriers on both sides of a store/load pair.Effectuate
To be pedantic (which is required in these things!) I understand the requirement is that the zeroing of the mem must occur before, in all possible memory orderings, the reading in dispatch_once. This is of course where barriers come in. GCD omits barriers by relying on a stronger condition ("never been non-zero"), but there must be a way to insert the barriers GCD has omitted. So it is a question of where, not if; and of course if any perf hit of doing so is acceptable in the context, but that is somewhat orthogonal. TL;DR: If barriers cannot solve this the architecture is borked ;-)Ammunition
You might be able put appropriate barriers before the call to dispatch_once() and at the end of the block executed by dispatch_once(). But this is sub-optimal for both performance and correctness. It is slower than the right implementation: on some architectures the barrier is faster when co-located with the memory access. It is more fragile than the right implementation: it's easy to omit or misplace one of the barriers.Documentary
@GregParker - what you describe doesn't seem to tally with the source of dispatch_once itself; as surely during its first execution it must arrange that, in all possible memory orderings, the conditions required for the second and subsequent calls hold, these are essentially the same as for the first excepting the value stored in the dispatch_once_t variable, and the source does not appear to implement the algorithm you describe. Moving on, you mention the "right implementation", could you add that as an answer? How should the OP solve the "only once per instance" problem? TIAAmmunition
Look at the definition of _dispatch_once. The execution path for a thread that does not run the once block clearly has no memory barriers. (dispatch_compiler_barrier doesn't count; it does not affect CPU behavior.) Without a memory barrier an weakly-ordered CPU is free to allow the thread to continue past dispatch_once but present memory contents to that thread as if the once block had not yet finished. That would be bad. The only reason it works is that dispatch_once assumes the dispatch_once_t is in static storage.Documentary
@GregParker - "The execution path for a thread that does not run the once block clearly has no memory barriers". So you are talking about the second and subsequent passes, presumably the full barrier added to init addressing the first pass. Now dispatch_once does have a barrier (of some kind) following its execution of the user block, so you appear to be saying this is not now sufficient – when it normally is?? And even if it wasn't could not a full barrier be added to the user block (any efficiency or error-prone issues aside)? (Maybe you should move any reply to chat.)Ammunition
Memory barriers are fundamentally two-sided. You need a barrier in the thread performing the operations and a barrier in the thread observing the operations. dispatch_once omits the latter: the "second" thread's execution path has no memory barrier. This is not sufficient in the general case. Without the second barrier, the second thread could see memory changing out of the desired memory order (i.e. it could see "dispatch is done" be true before the dispatched initializers are complete). Omitting one of the barriers only works for dispatch_once because of the assumption of static storage.Documentary
@GregParker - Reading your last comment made sense until you failed to mention that dispatch_once() handles the scenario you describe - so you don't see "dispatch is done" before the dispatched initialisers are complete. I thought you were going to say I needed a more appropriate barrier to do what dispatch_once() clearly achieves. Maybe we're just misunderstanding each other, or I'm missing something obvious. I've updated my answer adding a longer explanation of my logic. If this doesn't address your concerns I'd be pleased to hear why, and am ready to facepalm if I've missed the obvious!Ammunition
This should be accepted answer. The true condition for the (current) implementation of dispatch_once to work on the (on the fast-path read-side) is that that (a) there is no store-store reordering between the stores in block() and the subsequent store to predicate (b) there is no load-load reordering between the fast-path read of predicate and subsequent loads (of values written by block()) and (c) that any thread that accesses a given dispatch_once_t object is guaranteed to see it either in it's initial zero state or ...Viperous
in one of the subsequent states written by a call to dispatch_once on the same dispatch_once_t. The fast and slow path mechanics of dispatch_once are trying to satisfy (a) and (b), while keeping (a) as cheap as possible (and frankly they are on thin ground here: the idea that you can prevent load-load reordering by something slow on another writing thread isn't supported in the formal memory models). It doesn't really address (c) however. If you simply require that the dispatch_once_t object appears in static memory it is trivially satisfied since that memory has always been zero.Viperous
On the other hand, if your dispatch_once_t object appears in an object instance, (c) is already satisfied for the same thread that creates the object (obvious the single-threaded semantics have to be obeyed as seen that same thread in any implementation), so the question is basically "can threads other than the creating thread see the dispatch_once_t in an unitialized (non-zero) state? That boils down to how the other threads get access to the object: it is created on one thread, and then somehow published for use by other threads...Viperous
... (e.g., by writing a pointer to a global variable, by sharing the a pointer through a shared atomic collection, etc). Normally you want to ensure that such publishing is safe, hence the concept of safe publication (which doesn't seem to have gotten much traction as a term in Objective-C, but the concept is fairly universal). Safe publication means that any other thread which is able to see the new object (e.g., gets a pointer to it) sees in a fully constructed state. The usual requirement is that the publishing write has release semantics, and that all subsequent reads ...Viperous
... on other threads have acquire semantics. Note that there isn't a strong relationship between the underlying "thread-safety" of the object and safe publishing: they are mostly independent. It is possible to safely publish an object that is not thread-safe (and this may be safe if such an object is never mutated), and it may be possible to unsafely publish an object that is otherwise thread-safe (and this may be unsafe - for example, many mutex or semaphore implementations are not safe to access from multiple threads if not safely published).Viperous
Wrapping it up, in the context of the OP's question: you have some object instance which has a dispatch_once_t and the OP evidently wants to share these objects across threads. Naturally the object probably has other mutable and/or immutable state. The object probably has to safely published to these other threads for the sharing to be safe at all, independent of the use dispatch_once_t: only in very special cases is it safe to unsafely share objects. If the object is safely published, dispatch_oncein the instance will be just as safe as it is in the "static data" case.Viperous
@Ammunition is basically coming to the same conclusion here: the memory barrier during object creation is kind of trying to get at safe publication, by building it into the constructor (i.e., essentially putting a release barrier after all the initialization stores). You also need acquire semantics for the consuming threads, but this is often "free" (well needs only a compiler barrier) since after Alpha there haven't been any platforms that speculate data-dependent loads and almost every sharing method creates that data dependency.Viperous
So I don't think you need the barrier in the constructor, per se - just ensure your sharing is safe, which is something you probably needed anyway. FWIW - it's no different for other common synchronization primitives such as all the pthreads stuff, the Windows primitives and so on: they would all be subject to the same caveat and would "break" if their initial state were shared unsafely. Of course, you don't see such a caveat at all, and it would make them quite a bit less useful.Viperous
N
2

The reference you quote seems pretty clear: the predicate has to be in global or static scope, if you use it as a member variable, it will be dynamic, so the result will be undefined. So no, you can't. dispatch_once() is not what you're looking for (the reference also says: Executes a block object once and only once for the lifetime of an application, which is not what you want since you want this block to execute for each instance).

Nalepka answered 13/12, 2012 at 9:13 Comment(1)
I didn't notice the phase 'for the lifetime of an application'. However, it is possible to mean per instance for the lifetime of an application.Bangup
T
0

As long as you can guarantee that your dispatch_once_t is zero, it is safe, yet that is not guaranteed automatically for dynamic memory.

The system will initialize all instance variable of an object to zero but that does not happen in a thread-safe manner as objects are not thread-safe themselves. Even if you manually write zero to dispatch_once_t, this write is not guaranteed to happen at once and be immediately visible to other threads.

The only way how this would be safe is to use memory barriers, e.g. the stdatomic ones of C11:

#include <stdatomic.h>

@implementation SomeObject
    {
        dispatch_once_t _once;
    } 


    - (instancetype)init
    {
        self = [super init];
        _once = 0; // This is optional, the system has already done that write
        atomic_thread_fence(memory_order_release);
        return self;
    }


    - (void)someMethod
    {
        atomic_thread_fence(memory_order_acquire);
        dispatch_once(&_once, ^{  });
     }

That is save as acquire means that before any read operation (like the access of &_once), all pending write operations that happened before release (like assigning zero to _once) must have been completed and be visible to all threads.

There can only be a problem if [super init] somehow leads to a call to -someMethod before returning but that would be pretty unsafe to begin with as calling a method of an object prior to finishing its initialization is unsafe since the object is not yet in a determined state; that's why Swift doesn't even allow that anymore (you cannot make method calls from within init in Swift to yourself unless you initialized all instance variables that require initialization and super init was called and has completed).

Performance-wise it would be way better to use atomic read/write operations just to the ivar, e.g. in init you'd use

atomic_store_explicit(&_once, 0, memory_order_release);

and in someMethod you use:

atomic_fetch_explicit(&_once, memory_order_acquire);

since then the fence only protects the variable (or on most systems, the memory page the variable is located in) and not the entire memory of the current process, but that cannot be done as you can only use these atomic methods with atomic data types (like atomic_int) and dispatch_once_t is no atomic data type (on my system it is a typedef of intptr_t).

So while you can use dispatch_once that way, the question is if that would buy you anything compared to just using that:

@implementation SomeObject
    {
        NSArray * _values;
        NSLock * _lock;
    } 


    - (instancetype)init
    {
        self = [super init];
        _lock = [NSLock new];
        return self;
    }


    - (void)someMethod
    {
        [_lock lock];
        if (!_values) {
             // Init _values somehow
        }
        [_lock unlock];
        // Use _values
     }
Tradespeople answered 17/1, 2023 at 18:52 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.