You ask about copy-on-write (CoW) behaviors and thread-safety of Array
. Let us separate these two issues.
First, let’s discuss CoW (and, more specifically, value-type semantics within the context of closures).
In short, value types can achieve thread-safety only if one ensures that each thread gets a copy. And if one does provide a copy, CoW is an optimization whereby it will actually will only copy if, and only if, it is needed (i.e., if one of the copies actually mutates on a “write”).
But if we mutate an array from a closure, we are not automatically participating in value-type semantics (nor CoW, either). If we give a closure a “capture list”, we will be making a copy. If we don't provide a capture list (and we really cannot in this case, anyway), one will be mutating the original array (no copies, no CoW).
Consider the following (non-thread-safe) example:
let queue = DispatchQueue.global()
var array = [0, 0, 0]
queue.asyncAfter(deadline: .now() + 2) { [array] in // capture list makes copy
print("copy of original array:", array) // [0, 0, 0]
}
queue.asyncAfter(deadline: .now() + 1) { // no capture list means we're going to see any changes to original array
print("original array mutated:", array) // [1, 2, 0]
}
queue.async { // no capture list
array[0] = 1 // this is changing the original array!
}
array[1] = 2 // obviously this also changes the original array
With the [array]
capture list. a copy is made. But in the absence of that, no copy is made. We do not really want to use capture lists, anyway, but I include this discussion so that we understand when copies (and CoW) are relevant, and where they are not.
All of this having been said, we will often explicitly use value-type semantics and give each thread their own copies (avoiding data sharing and eliminating data races). But, that simply is not viable when dealing with a parallel algorithm working on a pixel buffer. First, it would be horribly inefficient to make copies of a large buffer. Second, it would be very hard to integrate all the results: If one thread updates the first pixel and another thread updates the second pixel (and so forth), you really do not want all these multiple copies of that array, one with the original array, one with the first pixel changed and another with the second pixel changed, etc. In parallel algorithms, we really want to be working against a shared pixel buffer. In short, we do not want multiple copies. And with no copies, CoW is a moot issue.
So, having dispensed with value-semantics and CoW, let us turn to the fundamental lack of thread-safety with Array
. As we discussed above, when you use concurrentPerform
closures and reference the array, you are not dealing with copies of the array. So, if the multiple iterations of concurrentPerform
are mutating the same base array, is this thread-safe? No. In small examples, it might look ok, but there actually is a data race, easily illustrated with large arrays.
This can be diagnosed for us by temporarily turning on Thread Sanitizer, a.k.a. TSAN, (press command+< or choose “Product” » “Scheme”, and then choose “Run” » “Diagnostics” » “Thread sanitizer”). You will see that it is not thread-safe to mutate values in an Array
from multiple threads at the same time. If you only have a few items in your array, TSAN might not catch the issue. And with TSAN turned off, you might not see the problem. But there is a race. And the bigger your array is, the more likely you are to manifest issues and thread sanitizer warnings. With TSAN turned on, you may see an error reporting the data race:
WARNING: ThreadSanitizer: Swift access race (pid=61280)
Modifying access of Swift variable at 0x7b080002d3f0 by thread T3:
#0 MyApp.Foo.pixels.modify : Swift.Array<Swift.Int> <null> (MyApp:x86_64+0x10000519c)
#1 closure #1 (Swift.Int) -> () in closure #1 () -> () in MyApp.Foo.experiment(completion: () -> ()) -> () <null> (MyApp:x86_64+0x100005a46)
#2 partial apply forwarder for closure #1 (Swift.Int) -> () in closure #1 () -> () in MyApp.Foo.experiment(completion: () -> ()) -> () <null> (MyApp:x86_64+0x100006a95)
#3 partial apply forwarder for reabstraction thunk helper from @callee_guaranteed (@unowned Swift.Int) -> () to @escaping @callee_guaranteed (@unowned Swift.Int) -> () <null> (libswiftDispatch.dylib:x86_64+0x52e0)
#4 _dispatch_client_callout2 <null> (libdispatch.dylib:x86_64+0x615e)
Previous modifying access of Swift variable at 0x7b080002d3f0 by thread T1:
#0 MyApp.Foo.pixels.modify : Swift.Array<Swift.Int> <null> (MyApp:x86_64+0x10000519c)
#1 closure #1 (Swift.Int) -> () in closure #1 () -> () in MyApp.Foo.experiment(completion: () -> ()) -> () <null> (MyApp:x86_64+0x100005a46)
#2 partial apply forwarder for closure #1 (Swift.Int) -> () in closure #1 () -> () in MyApp.Foo.experiment(completion: () -> ()) -> () <null> (MyApp:x86_64+0x100006a95)
#3 partial apply forwarder for reabstraction thunk helper from @callee_guaranteed (@unowned Swift.Int) -> () to @escaping @callee_guaranteed (@unowned Swift.Int) -> () <null> (libswiftDispatch.dylib:x86_64+0x52e0)
#4 _dispatch_client_callout2 <null> (libdispatch.dylib:x86_64+0x615e)
Location is heap block of size 24 at 0x7b080002d3e0 allocated by main thread:
#0 __sanitizer_mz_malloc <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x5562c)
#1 _malloc_zone_malloc_instrumented_or_legacy <null> (libsystem_malloc.dylib:x86_64+0x1f04d)
#2 MyApp.ContentView.init() -> MyApp.ContentView <null> (MyApp:x86_64+0x100004cd7)
#3 closure #1 () -> MyApp.ContentView in MyApp.MyAppApp.body.getter : some <null> (MyApp:x86_64+0x100006e70)
#4 __swift_memcpy226_8 <null> (SwiftUI:x86_64+0x112010c)
#5 protocol witness for SwiftUI.App.body.getter : A.Body in conformance MyApp.MyAppApp : SwiftUI.App in MyApp <null> (MyApp:x86_64+0x1000070bd)
#6 __swift_memcpy16_4 <null> (SwiftUI:x86_64+0x6fae99)
#7 main <null> (MyApp:x86_64+0x100007135)
Thread T3 (tid=4368620, running) is a GCD worker thread
Thread T1 (tid=4368616, running) is a GCD worker thread
SUMMARY: ThreadSanitizer: Swift access race (/…/DerivedData/…/Build/Products/Debug/MyApp.app/Contents/MacOS/MyApp:x86_64+0x10000519c) in MyApp.Foo.pixels.modify : Swift.Array<Swift.Int>+0x4c
==================
ThreadSanitizer report breakpoint hit. Use 'thread info -s' to get extended information about the report.
For more information on TSAN, see Diagnosing memory, thread, and crash issues early.
What we frequently do when dealing with large buffers is to use “unsafe pointers”, which circumvents a lot of the Array mechanisms. For example, one can use withUnsafeMutableBufferPointer
:
class Foo {
var pixels: [Int] = .init(repeating: 0, count: 10_000_000)
func experiment(completion: @escaping () -> Void) {
DispatchQueue.global(qos: .utility).async { [self] in
let n = pixels.count
pixels.withUnsafeMutableBufferPointer { buffer in
DispatchQueue.concurrentPerform(iterations: n) { i in
buffer[i] = …
}
}
DispatchQueue.main.async {
completion()
}
}
}
}
Once you have that UnsafeMutableBufferPointer<Int>
, you can update that memory. Because you are now working with a mutable raw buffer pointer, it avoids Array
implementation issues.
Frankly, when dealing with pixel buffers, we frequently deal with image “data providers” or a graphical context (e.g., see Change color of certain pixels in a UIImage). That provides raw access to the buffers for an image. So, if you are trying to process the pixels of an image, that might be another approach. But just like withUnsafeMutableBytes
, we deal with raw pixel buffers.
Or sometimes when dealing with a buffer pointer, we just allocate
a buffer directly (but remember to deallocate
when you are done), and use the buffer pointer directly, avoiding Array
. E.g.,
let pixels = UnsafeMutablePointer<Int>.allocate(capacity: 10_000_000)
…
// when all done with the buffer, explicitly deallocate it
pixels.deallocate()
A few minor observations in your code snipet:
I notice that you are creating your own private concurrent queue. That is not necessary. The fact that it is concurrent or serial is immaterial. The concurrentPerform
is only grabbing the queue priority. Personally, as shown above, I just use a global queue.
You are synchronously dispatching to your queue. You really do not want to block the calling thread. I would suggest dispatching it asynchronously, and then calling a completion handler when you are done. I generally call the completion handler closure on the main queue (because I invariably want to update the UI when it is done), but you can do whatever you want. But I would advise against using sync
(except for very quick synchronization tasks).
Some final performance observations:
You might want to stride through your values, to ensure that each iteration of concurrentPerform
is doing enough work to offset the modest overhead introduced by parallel processing. You may also want to avoid having separate threads updating adjacent memory locations (avoiding “false sharing” problems).
We generally “stride” through the memory so that each thread is working on a separate set of pixels. In Long cycle blocks application, you will see that I have each iteration handle a separate row of the image. You might even have each iteration handle multiple rows of the image.
But the idea is to make sure that separate iterations are interacting with different portions of the pixel buffer. And, using my example of a pixel buffer with 10,000,000 items, rather than doing 10,000,000 iterations in concurrent perform, maybe do 100 iterations, each processing 10,000 pixels. You will have to experiment with different striding factors, but it can have a significant impact on performance. But if one does too many iterations with very little work on each thread (e.g., the degenerate example of 10m iterations, each updating a single pixel), you might actually find that it is slower than the serial rendition.
E.g., to stride every 10,000 values, you can do something like:
func experiment(completion: @escaping () -> Void) {
DispatchQueue.global(qos: .utility).async { [self] in
let n = pixels.count
let stride = 10_000
let (quotient, remainder) = n.quotientAndRemainder(dividingBy: stride)
let iterations = remainder == 0 ? quotient : quotient + 1
pixels.withUnsafeMutableBufferPointer { buffer in
DispatchQueue.concurrentPerform(iterations: iterations) { iteration in
let start = iteration * stride
let end = min(n, start + iteration)
for i in start ..< end {
buffer[i] = …
}
}
}
DispatchQueue.main.async {
completion()
}
}
}
Probably needless to say, when testing this, test on physical device with a release build. Debug builds are not optimized and will be much slower.
Finally, depending upon what you are doing in your pixel buffers, you might consider using Metal compute (using your GPU) or Accelerate libraries. Sometimes these can be much faster than home-spun CPU-bound calculations (even parallelized renditions).