Update November 16
This question was originally answered back in 2012 with an "amusement", it didn't claim to provide a definitive answer and carried a caveat to that effect. In hindsight such an amusement should probably have stayed private, though some enjoyed it.
In August 2016 this Q&A was brought to my attention and I provided a proper answer. In that wrote:
I am going to seemingly disagree with Greg Parker, but probably not really...
Well it does seem that Greg and I disagree over whether we disagree, or the answer, or something ;-) So I am updating my Aug 2016 answer with a more detailed basis for the answer, why it might be wrong, and if so how to fix it (so the answer to the original question is still "yes"). Hopefully Greg & I will either agree, or I will learn something - either outcome is good!
So first the Aug 16 answer as it was, then an explanation of the basis for the answer. The original amusement has been removed to avoid any confusion, students of history can view the edit trail.
Answer: Aug 2016
I am going to seemingly disagree with Greg Parker, but probably not really...
The original question:
Can I declare dispatch_once_t
predicate as a member variable instead of static variable?
Short Answer: The answer is yes PROVIDED there is a memory barrier between the initial creation of the object and any use of dispatch_once
.
Quick Explanation: The requirement on the dispatch_once_t
variable for dispatch_once
is that it must be initially zero. The difficult comes from memory re-ordering operations on modern multiprocessors. While it may appear that a store to a location has been performed according to the program text (high level language or assembler level) the actual store may be reordered and occur after a subsequent read of the same location. To address this memory barriers can be used which force all memory operations occurring before them to complete before those following them. Apple provides the OSMemoryBarrier()
to do this.
With dispatch_once
Apple is stating that zero-initialised global variables are guaranteed to be zero, but that the zero-initialised instance variables (and zero initialising is the Objective-C default here) are not guaranteed to be zero before a dispatch_once
is executed.
The solution is to insert a memory barrier; on the assumption that the dispatch_once
occurs in some member method of an instance the obvious place to put this memory barrier is in the init
method as (1) it will only be executed once (per instance) and (2) init
must have returned before any other member method can be called.
So yes, with an appropriate memory barrier, dispatch_once
can be used with an instance variable.
Nov 2016
Preamble: Notes on dispatch_once
These notes are based on Apple's code and comments for dispatch_once
.
Usage of dispatch_once
follows the standard pattern:
id cachedValue;
dispatch_once_t predicate = 0;
...
dispatch_once(&predicate, ^{ cachedValue = expensiveComputation(); });
... use cachedValue ...
and the last two lines are expanded inline (dispatch_once
is a macro) to something like:
if (predicate != ~0) // (all 1's, indicates the block has been executed) [A]
{
dispatch_once_internal(&predicate, block); // [B]
}
... use cachedValue ... // [C]
Notes:
Apple's source states that predicate
must be initialised to zero and notes that global and static variables default to zero initialisation.
Note that at line [A] there is no memory barrier. On a processor with speculative read-ahead and branch predication the read of cachedValue
in line [C] could occur before the read of predicate
in line [A], which could lead to wrong results (a wrong value for cachedValue
)
A barrier could be used to prevent this, however that is slow and Apple want this to be fast in the common case that the once block has already been performed, so...
dispatch_once_internal
, line [B], which does use barriers and atomic operations internally, uses a special barrier, dispatch_atomic_maximally_synchronizing_barrier()
to defeat the speculative read-ahead and so allow line [A] to be barrier free and hence fast.
Any processor reaching line [A] before dispatch_once_internal()
has been executed and mutated predicate
needs to read 0
from predicate
. Using a global or static initialised to zero for predicate
will guarantee this.
The important take away for our current purposes is that dispatch_once_internal
mutates predicate
in such a way that line [A] works without any barrier.
Long Explanation of Aug 16 Answer:
So we know using a global or static initialised to zero meets the requirements of dispatch_once()
's barrier-free fastpath. We also know that mutations made by dispatch_once_internal()
to predicate
are correctly handled.
What we need to determine is whether we can use an instance variable for predicate
and initialise it in such a way that line [A] above can never read its pre-initialised value - as if it could things would break.
My Aug 16 answer says this is possible. To understand the basis for this we need to consider the program and data flow in a multi-processor environment with speculative read-ahead.
The outline of the Aug 16 answer's execution and data flow is:
Processor 1 Processor 2
0. Call alloc
1. Zero instance var used for predicate
2. Return object ref from alloc
3. Call init passing object ref
4. Perform barrier
5. Return object ref from init
6. Store or send object ref somewhere
...
7. Obtain object ref
8. Call instance method passing obj ref
9. In called instance method dispatch_once
tests predicate, This read is dependent
on passed obj ref.
To be able to use an instance variable as the predicate then it must be impossible to execute step 9 in such a way that it reads the value in the memory before step 1 has zeroed it.
If step 4 is omitted, i.e. no appropriate barrier is inserted in init
then, though the Processor 2 must obtain the correct value for the object reference generated by Processor 1 before it can execute step 9, it is (theoretically) possible that Processor 1's zero writes in step 1 have not yet been performed/written to global memory and Processor 2 will not see them.
So we insert step 4 and perform a barrier.
However we now have to consider speculative read-ahead, just as dispatch_once()
has to. Could Processor 2 perform the read of step 9 before the barrier of step 4 has ensured the memory is zero?
Consider:
Processor 2 cannot perform, speculatively or otherwise, the read of step 9 until it has the object reference obtained in step 7 - and to do so speculatively requires the processor to determine that the method call in step 8, whose destination in Objective-C is dynamically determined, will end up at method containing step 9, which is quite advanced (but not impossible) speculation;
Step 7 cannot obtain the object reference until step 6 has stored/passed it;
Step 6 hasn't got it to store/pass until step 5 has returned it; and
Step 5 is after the barrier at step 4...
TL;DR: How can step 9 have the object reference required to perform the read until after step 4 containing the barrier? (And given the long execution path, with multiple branches, some conditional (e.g. inside method dispatch), is speculative read-ahead an issue at all?)
So I argue that the barrier at step 4 is sufficient, even in the presence of speculative read-ahead effecting step 9.
Consideration of Greg's comments:
Greg strengthened Apple's source code comment regarding the predicate from "must be initialised to zero" to "must never have been non-zero", which means since load time, and this is only true for global and static variables initialised to zero. The argument is based on defeating speculative read-ahead by modern processors required for the barrier-free dispatch_once()
fast path.
Instance variables are initialised to zero at object creation time, and the memory they occupy could have been non-zero before then. However as has been argued above a suitable barrier can be used to ensure that dispatch_once()
does not read a pre-initialisation value. I think Greg disagrees with my argument, if I follow his comments correctly, and argues that the barrier at step 4 is insufficient to handle speculative read-ahead.
Let's assume Greg is right (which is not at all improbable!), then we are in a situation Apple has already dealt with in dispatch_once()
, we need to defeat the read-ahead. Apple does that by using the dispatch_atomic_maximally_synchronizing_barrier()
barrier. We can use this same barrier at step 4 and prevent the following code from executing until all possible speculative-read ahead by Processor 2 has been defeated; and as the following code, steps 5 & 6, must execute before Processor 2 even has an object reference it can use to speculatively perform step 9 everything works.
So if I understand Greg's concerns then using dispatch_atomic_maximally_synchronizing_barrier()
will address them, and using it instead of a standard barrier will not cause an issue even if it not actually required. So though I'm not convinced it is necessary it is at worst harmless to do so. My conclusion therefore remains as before (emphasis added):
So yes, with an appropriate memory barrier, dispatch_once
can be used with an instance variable.
I'm sure Greg or some other reader will let me know if I have erred in my logic. I stand ready to facepalm!
Of course you have to decide whether the cost of the appropriate barrier in init
is worth the benefit you gain from using dispatch_once()
to obtain once-per-instance behaviour or whether you should address your requirements another way – and such alternatives are outside the scope of this answer!
Code for dispatch_atomic_maximally_synchronizing_barrier()
:
A definition of dispatch_atomic_maximally_synchronizing_barrier()
, adapted from Apple's source, that you can use in your own code is:
#if defined(__x86_64__) || defined(__i386__)
#define dispatch_atomic_maximally_synchronizing_barrier() \
({ unsigned long _clbr; __asm__ __volatile__( "cpuid" : "=a" (_clbr) : "0" (0) : "ebx", "ecx", "edx", "cc", "memory"); })
#else
#define dispatch_atomic_maximally_synchronizing_barrier() \
({ __c11_atomic_thread_fence(dispatch_atomic_memory_order_seq_cst); })
#endif
If you want to know how this works read Apple's source code.