Writing to different Swift array indexes from different threads
Asked Answered
C

3

5

I see frequent mention that Swift arrays, due to copy-on-write, are not threadsafe, but have found this works, as it updates different and unique elements in an array from different threads simultaneously:

//pixels is [(UInt8, UInt8, UInt8)]

let q = DispatchQueue(label: "processImage", attributes: .concurrent)
q.sync {

  DispatchQueue.concurrentPerform(iterations: n) { i in
     ... do work ...
     pixels[i] = ... store result ...
  }
}

(simplified version of this function)

If threads never write to the same indexes, does copy-on-write still interfere with this? I'm wondering if this is safe since the array itself is not changing length or memory usage. But it does seem that copy-on-write would prevent the array from staying consistent in such a scenario.

If this is not safe, and since doing parallel computations on images (pixel arrays) or other data stores is a common requirement in parallel computation, what is the best idiom for this? Is it better that each thread have its own array and then they are combined after all threads complete? It seems like additional overhead and the memory juggling from creating and destroying all these arrays doesn't feel right.

Composition answered 15/2, 2023 at 12:1 Comment(1)
I just found this which offers a lot of useful additional context to this question: mjtsai.com/blog/2021/04/15/…Composition
C
7
Updated answer

Is it safe to write directly to different indices of an array from different threads?

No. If we run the following code, the thread sanitizer (see this guide) reports a race condition when writing to values[idx]. However, in my testing, it does work in practice more or less every single time. I ran it in a loop, running thousands upon thousands of times, and had one crash. But this is clearly not what we're meant to do.

let NUM = 1_000_000

func a() {
    var values = [Int](repeating: 0, count: NUM)
    
    DispatchQueue.concurrentPerform(iterations: NUM) { idx in
        values[idx] = idx // <- not thread safe
    }
}

However, it does not seem to be related to any copy-on-write mechanics. Since the threads access the array via closure capture, they are all in fact accessing the same array. We can also put the array in a reference type Box, and we still have problems with race conditions. To me, this indicates that it's not the copy-on-write behaviour of arrays that is the root of the problem.

class Box {
    var values: [Int]
    init(values: [Int]) {
        self.values = values
    }
    
    func update(at index: Int, value: Int) {
        values[index] = value // <- not thread safe
    }
}

func b() {
    let box = Box(values: [Int](repeating: 0, count: NUM))
    
    DispatchQueue.concurrentPerform(iterations: NUM) { idx in
        box.update(at: idx, value: idx)
    }
}

b()

If we access the underlying buffer directly via withUnsafeMutableBufferPointer, it should however work correctly. The thread sanitizer doesn't complain, at least.

func c() {
    var values = [Int](repeating: 0, count: NUM)
    
    values.withUnsafeMutableBufferPointer { buffer in
        DispatchQueue.concurrentPerform(iterations: NUM) { idx in
            buffer[idx] = idx // <- *is* thread safe
        }
    }
}

The copy on write behaviour of arrays in Swift should be thread safe. The reason this isn't safe is because COW isn't being triggered in the first place.

The array value references an underlying memory buffer. Before editing the buffer, it checks that the reference count of the buffer is greater than 1. If it is, it copies the buffer to a new memory location and uses that new buffer instead. This is thread-safe since the original buffer will be untouched.

In the example above, the reference count for the underlying buffer remains at 1, since we don't actually have multiple arrays all referencing the same buffer; we have multiple "references" to the same array, referencing the buffer only once. Since there's only the one array value, no copy on write will happen.

Chirurgeon answered 15/2, 2023 at 12:54 Comment(15)
I noticed you don't actually create a queue marked as .concurrent as in my example, and then call .sync on it. Maybe not necessary, but the docs for concurrentPerform say If the target queue is a concurrent queue, the blocks run in parallel ... so how does it know your queue is concurrent? Or better question, why do the docs suggest it is necessary to explicitly have a concurrent queue before calling concurrentPerform? Confusing!Composition
To be perfectly frank, I have no idea. I have never written parallel Swift code before and I didn't really bother reading the documentation. I guess because I'm not targeting a specific queue, it goes on a ... default one? One of the threads doing the work is the main thread.Chirurgeon
I updated the answer. Please read through the update and see if it helps!Chirurgeon
thanks! Couple points to make, worthy of consideration. Value types, such as arrays, are not reference counted, only classes/objects are. Also inout parameters in Swift are copied, changed, then returned to write back to the source like assignment, and they do not directly touch the underlying memory (see the Swift book), despite the & that looks like a C memory address, it isn't. Thus, if a thread writes to part of an array and returns it, I would expect the unchanged part will overwrite what another thread might have changed in an inout situation. I think my example works by accident.Composition
Arrays aren’t ref counted, but their underlying memory buffer is.Chirurgeon
I'm not convinced that your statement about inout parameters is accurate. I just checked the output assembly and in simple cases at least it actually just grabs the address of the value. Does more or less the same thing as if you write "equivalent" code in C (taking a pointer to int instead of an int).Chirurgeon
I tried some stuff with manually allocated UnsafeMutableBuffers to see if it made a difference.Chirurgeon
That sounds like an implementation detail that you can't necessarily depend on. The Swift spec seems to be pretty clear about how inout parameters behave and are treated in the function. Perhaps the compiler can optimize this if it doesn't see any conflicts. docs.swift.org/swift-book/documentation/… It's called "copy-in copy-out" for a reason.Composition
The plot thickens. I had a random crash due to bad thread access with the above code.Chirurgeon
Exactly. I don't think arrays can be used this way.Composition
Yeah. It took a few tries. I'm running the program in a loop from the command line.Chirurgeon
So even if you are accessing the underlying buffer from different threads, it is crashing? Or is it crashing by directly accessing the array itself.Composition
I'm not sure. I accidentally ran it again before I had time to react to the crash, beyond seeing the typical "bad thread access" message. Also: the thread sanitizer does not like using arrays this way. It looks like it's the sort of thing that most likely will work, but you shouldn't depend on it. I will make a final update to the answer.Chirurgeon
found a related discussion here: https://mcmap.net/q/1922115/-filling-an-array-concurrently in the commentsComposition
Added a (final) update based on our discussion and further findings.Chirurgeon
D
9

You ask about copy-on-write (CoW) behaviors and thread-safety of Array. Let us separate these two issues.

First, let’s discuss CoW (and, more specifically, value-type semantics within the context of closures).

In short, value types can achieve thread-safety only if one ensures that each thread gets a copy. And if one does provide a copy, CoW is an optimization whereby it will actually will only copy if, and only if, it is needed (i.e., if one of the copies actually mutates on a “write”).

But if we mutate an array from a closure, we are not automatically participating in value-type semantics (nor CoW, either). If we give a closure a “capture list”, we will be making a copy. If we don't provide a capture list (and we really cannot in this case, anyway), one will be mutating the original array (no copies, no CoW).

Consider the following (non-thread-safe) example:

let queue = DispatchQueue.global()
var array = [0, 0, 0]

queue.asyncAfter(deadline: .now() + 2) { [array] in  // capture list makes copy
    print("copy of original array:", array)          // [0, 0, 0]
}

queue.asyncAfter(deadline: .now() + 1) {             // no capture list means we're going to see any changes to original array
    print("original array mutated:", array)          // [1, 2, 0]
}

queue.async {                                        // no capture list
    array[0] = 1                                     // this is changing the original array!
}

array[1] = 2                                         // obviously this also changes the original array

With the [array] capture list. a copy is made. But in the absence of that, no copy is made. We do not really want to use capture lists, anyway, but I include this discussion so that we understand when copies (and CoW) are relevant, and where they are not.

All of this having been said, we will often explicitly use value-type semantics and give each thread their own copies (avoiding data sharing and eliminating data races). But, that simply is not viable when dealing with a parallel algorithm working on a pixel buffer. First, it would be horribly inefficient to make copies of a large buffer. Second, it would be very hard to integrate all the results: If one thread updates the first pixel and another thread updates the second pixel (and so forth), you really do not want all these multiple copies of that array, one with the original array, one with the first pixel changed and another with the second pixel changed, etc. In parallel algorithms, we really want to be working against a shared pixel buffer. In short, we do not want multiple copies. And with no copies, CoW is a moot issue.

So, having dispensed with value-semantics and CoW, let us turn to the fundamental lack of thread-safety with Array. As we discussed above, when you use concurrentPerform closures and reference the array, you are not dealing with copies of the array. So, if the multiple iterations of concurrentPerform are mutating the same base array, is this thread-safe? No. In small examples, it might look ok, but there actually is a data race, easily illustrated with large arrays.

This can be diagnosed for us by temporarily turning on Thread Sanitizer, a.k.a. TSAN, (press command+< or choose “Product” » “Scheme”, and then choose “Run” » “Diagnostics” » “Thread sanitizer”). You will see that it is not thread-safe to mutate values in an Array from multiple threads at the same time. If you only have a few items in your array, TSAN might not catch the issue. And with TSAN turned off, you might not see the problem. But there is a race. And the bigger your array is, the more likely you are to manifest issues and thread sanitizer warnings. With TSAN turned on, you may see an error reporting the data race:

WARNING: ThreadSanitizer: Swift access race (pid=61280)
  Modifying access of Swift variable at 0x7b080002d3f0 by thread T3:
    #0 MyApp.Foo.pixels.modify : Swift.Array<Swift.Int> <null> (MyApp:x86_64+0x10000519c)
    #1 closure #1 (Swift.Int) -> () in closure #1 () -> () in MyApp.Foo.experiment(completion: () -> ()) -> () <null> (MyApp:x86_64+0x100005a46)
    #2 partial apply forwarder for closure #1 (Swift.Int) -> () in closure #1 () -> () in MyApp.Foo.experiment(completion: () -> ()) -> () <null> (MyApp:x86_64+0x100006a95)
    #3 partial apply forwarder for reabstraction thunk helper from @callee_guaranteed (@unowned Swift.Int) -> () to @escaping @callee_guaranteed (@unowned Swift.Int) -> () <null> (libswiftDispatch.dylib:x86_64+0x52e0)
    #4 _dispatch_client_callout2 <null> (libdispatch.dylib:x86_64+0x615e)

  Previous modifying access of Swift variable at 0x7b080002d3f0 by thread T1:
    #0 MyApp.Foo.pixels.modify : Swift.Array<Swift.Int> <null> (MyApp:x86_64+0x10000519c)
    #1 closure #1 (Swift.Int) -> () in closure #1 () -> () in MyApp.Foo.experiment(completion: () -> ()) -> () <null> (MyApp:x86_64+0x100005a46)
    #2 partial apply forwarder for closure #1 (Swift.Int) -> () in closure #1 () -> () in MyApp.Foo.experiment(completion: () -> ()) -> () <null> (MyApp:x86_64+0x100006a95)
    #3 partial apply forwarder for reabstraction thunk helper from @callee_guaranteed (@unowned Swift.Int) -> () to @escaping @callee_guaranteed (@unowned Swift.Int) -> () <null> (libswiftDispatch.dylib:x86_64+0x52e0)
    #4 _dispatch_client_callout2 <null> (libdispatch.dylib:x86_64+0x615e)

  Location is heap block of size 24 at 0x7b080002d3e0 allocated by main thread:
    #0 __sanitizer_mz_malloc <null> (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x5562c)
    #1 _malloc_zone_malloc_instrumented_or_legacy <null> (libsystem_malloc.dylib:x86_64+0x1f04d)
    #2 MyApp.ContentView.init() -> MyApp.ContentView <null> (MyApp:x86_64+0x100004cd7)
    #3 closure #1 () -> MyApp.ContentView in MyApp.MyAppApp.body.getter : some <null> (MyApp:x86_64+0x100006e70)
    #4 __swift_memcpy226_8 <null> (SwiftUI:x86_64+0x112010c)
    #5 protocol witness for SwiftUI.App.body.getter : A.Body in conformance MyApp.MyAppApp : SwiftUI.App in MyApp <null> (MyApp:x86_64+0x1000070bd)
    #6 __swift_memcpy16_4 <null> (SwiftUI:x86_64+0x6fae99)
    #7 main <null> (MyApp:x86_64+0x100007135)

  Thread T3 (tid=4368620, running) is a GCD worker thread

  Thread T1 (tid=4368616, running) is a GCD worker thread

SUMMARY: ThreadSanitizer: Swift access race (/…/DerivedData/…/Build/Products/Debug/MyApp.app/Contents/MacOS/MyApp:x86_64+0x10000519c) in MyApp.Foo.pixels.modify : Swift.Array<Swift.Int>+0x4c
==================
ThreadSanitizer report breakpoint hit. Use 'thread info -s' to get extended information about the report.

For more information on TSAN, see Diagnosing memory, thread, and crash issues early.

What we frequently do when dealing with large buffers is to use “unsafe pointers”, which circumvents a lot of the Array mechanisms. For example, one can use withUnsafeMutableBufferPointer:

class Foo {
    var pixels: [Int] = .init(repeating: 0, count: 10_000_000)

    func experiment(completion: @escaping () -> Void) {
        DispatchQueue.global(qos: .utility).async { [self] in
            let n = pixels.count

            pixels.withUnsafeMutableBufferPointer { buffer in
                DispatchQueue.concurrentPerform(iterations: n) { i in
                    buffer[i] = …
                }
            }

            DispatchQueue.main.async {
                completion()
            }
        }
    }
}

Once you have that UnsafeMutableBufferPointer<Int>, you can update that memory. Because you are now working with a mutable raw buffer pointer, it avoids Array implementation issues.

Frankly, when dealing with pixel buffers, we frequently deal with image “data providers” or a graphical context (e.g., see Change color of certain pixels in a UIImage). That provides raw access to the buffers for an image. So, if you are trying to process the pixels of an image, that might be another approach. But just like withUnsafeMutableBytes, we deal with raw pixel buffers.

Or sometimes when dealing with a buffer pointer, we just allocate a buffer directly (but remember to deallocate when you are done), and use the buffer pointer directly, avoiding Array. E.g.,

let pixels = UnsafeMutablePointer<Int>.allocate(capacity: 10_000_000)

…

// when all done with the buffer, explicitly deallocate it

pixels.deallocate()

A few minor observations in your code snipet:

  1. I notice that you are creating your own private concurrent queue. That is not necessary. The fact that it is concurrent or serial is immaterial. The concurrentPerform is only grabbing the queue priority. Personally, as shown above, I just use a global queue.

  2. You are synchronously dispatching to your queue. You really do not want to block the calling thread. I would suggest dispatching it asynchronously, and then calling a completion handler when you are done. I generally call the completion handler closure on the main queue (because I invariably want to update the UI when it is done), but you can do whatever you want. But I would advise against using sync (except for very quick synchronization tasks).


Some final performance observations:

  1. You might want to stride through your values, to ensure that each iteration of concurrentPerform is doing enough work to offset the modest overhead introduced by parallel processing. You may also want to avoid having separate threads updating adjacent memory locations (avoiding “false sharing” problems).

    We generally “stride” through the memory so that each thread is working on a separate set of pixels. In Long cycle blocks application, you will see that I have each iteration handle a separate row of the image. You might even have each iteration handle multiple rows of the image.

    But the idea is to make sure that separate iterations are interacting with different portions of the pixel buffer. And, using my example of a pixel buffer with 10,000,000 items, rather than doing 10,000,000 iterations in concurrent perform, maybe do 100 iterations, each processing 10,000 pixels. You will have to experiment with different striding factors, but it can have a significant impact on performance. But if one does too many iterations with very little work on each thread (e.g., the degenerate example of 10m iterations, each updating a single pixel), you might actually find that it is slower than the serial rendition.

    E.g., to stride every 10,000 values, you can do something like:

    func experiment(completion: @escaping () -> Void) {
        DispatchQueue.global(qos: .utility).async { [self] in
            let n = pixels.count
            let stride = 10_000
            let (quotient, remainder) = n.quotientAndRemainder(dividingBy: stride)
            let iterations = remainder == 0 ? quotient : quotient + 1
    
            pixels.withUnsafeMutableBufferPointer { buffer in
                DispatchQueue.concurrentPerform(iterations: iterations) { iteration in
                    let start = iteration * stride
                    let end = min(n, start + iteration)
    
                    for i in start ..< end {
                        buffer[i] = …
                    }
                }
            }
    
            DispatchQueue.main.async {
                completion()
            }
        }
    }
    
  2. Probably needless to say, when testing this, test on physical device with a release build. Debug builds are not optimized and will be much slower.

  3. Finally, depending upon what you are doing in your pixel buffers, you might consider using Metal compute (using your GPU) or Accelerate libraries. Sometimes these can be much faster than home-spun CPU-bound calculations (even parallelized renditions).

Dziggetai answered 21/2, 2023 at 20:51 Comment(2)
Thanks! I'm curious what the reason would be for this advice specifically: avoid having separate threads updating adjacent memory locations. My example is too trivial but I do stride normally. However I'm still interested to know what the issue would be in writing to separate but adjacent areas in memory from different threads.Composition
The concern is “false sharing” (a type of “cache sloshing”). See this video. Now, that video is within the context of a completely different parallel programming library, OpenMP (and I would advise staying with native API, such GCD), but it is an articulate discussion of how CPU architectures can have surprising impact on parallel algorithms. To be honest, the effect within GCD is negligible, and issues like simply ensuring that you just have enough work on each thread has far greater impact.Dziggetai
C
7
Updated answer

Is it safe to write directly to different indices of an array from different threads?

No. If we run the following code, the thread sanitizer (see this guide) reports a race condition when writing to values[idx]. However, in my testing, it does work in practice more or less every single time. I ran it in a loop, running thousands upon thousands of times, and had one crash. But this is clearly not what we're meant to do.

let NUM = 1_000_000

func a() {
    var values = [Int](repeating: 0, count: NUM)
    
    DispatchQueue.concurrentPerform(iterations: NUM) { idx in
        values[idx] = idx // <- not thread safe
    }
}

However, it does not seem to be related to any copy-on-write mechanics. Since the threads access the array via closure capture, they are all in fact accessing the same array. We can also put the array in a reference type Box, and we still have problems with race conditions. To me, this indicates that it's not the copy-on-write behaviour of arrays that is the root of the problem.

class Box {
    var values: [Int]
    init(values: [Int]) {
        self.values = values
    }
    
    func update(at index: Int, value: Int) {
        values[index] = value // <- not thread safe
    }
}

func b() {
    let box = Box(values: [Int](repeating: 0, count: NUM))
    
    DispatchQueue.concurrentPerform(iterations: NUM) { idx in
        box.update(at: idx, value: idx)
    }
}

b()

If we access the underlying buffer directly via withUnsafeMutableBufferPointer, it should however work correctly. The thread sanitizer doesn't complain, at least.

func c() {
    var values = [Int](repeating: 0, count: NUM)
    
    values.withUnsafeMutableBufferPointer { buffer in
        DispatchQueue.concurrentPerform(iterations: NUM) { idx in
            buffer[idx] = idx // <- *is* thread safe
        }
    }
}

The copy on write behaviour of arrays in Swift should be thread safe. The reason this isn't safe is because COW isn't being triggered in the first place.

The array value references an underlying memory buffer. Before editing the buffer, it checks that the reference count of the buffer is greater than 1. If it is, it copies the buffer to a new memory location and uses that new buffer instead. This is thread-safe since the original buffer will be untouched.

In the example above, the reference count for the underlying buffer remains at 1, since we don't actually have multiple arrays all referencing the same buffer; we have multiple "references" to the same array, referencing the buffer only once. Since there's only the one array value, no copy on write will happen.

Chirurgeon answered 15/2, 2023 at 12:54 Comment(15)
I noticed you don't actually create a queue marked as .concurrent as in my example, and then call .sync on it. Maybe not necessary, but the docs for concurrentPerform say If the target queue is a concurrent queue, the blocks run in parallel ... so how does it know your queue is concurrent? Or better question, why do the docs suggest it is necessary to explicitly have a concurrent queue before calling concurrentPerform? Confusing!Composition
To be perfectly frank, I have no idea. I have never written parallel Swift code before and I didn't really bother reading the documentation. I guess because I'm not targeting a specific queue, it goes on a ... default one? One of the threads doing the work is the main thread.Chirurgeon
I updated the answer. Please read through the update and see if it helps!Chirurgeon
thanks! Couple points to make, worthy of consideration. Value types, such as arrays, are not reference counted, only classes/objects are. Also inout parameters in Swift are copied, changed, then returned to write back to the source like assignment, and they do not directly touch the underlying memory (see the Swift book), despite the & that looks like a C memory address, it isn't. Thus, if a thread writes to part of an array and returns it, I would expect the unchanged part will overwrite what another thread might have changed in an inout situation. I think my example works by accident.Composition
Arrays aren’t ref counted, but their underlying memory buffer is.Chirurgeon
I'm not convinced that your statement about inout parameters is accurate. I just checked the output assembly and in simple cases at least it actually just grabs the address of the value. Does more or less the same thing as if you write "equivalent" code in C (taking a pointer to int instead of an int).Chirurgeon
I tried some stuff with manually allocated UnsafeMutableBuffers to see if it made a difference.Chirurgeon
That sounds like an implementation detail that you can't necessarily depend on. The Swift spec seems to be pretty clear about how inout parameters behave and are treated in the function. Perhaps the compiler can optimize this if it doesn't see any conflicts. docs.swift.org/swift-book/documentation/… It's called "copy-in copy-out" for a reason.Composition
The plot thickens. I had a random crash due to bad thread access with the above code.Chirurgeon
Exactly. I don't think arrays can be used this way.Composition
Yeah. It took a few tries. I'm running the program in a loop from the command line.Chirurgeon
So even if you are accessing the underlying buffer from different threads, it is crashing? Or is it crashing by directly accessing the array itself.Composition
I'm not sure. I accidentally ran it again before I had time to react to the crash, beyond seeing the typical "bad thread access" message. Also: the thread sanitizer does not like using arrays this way. It looks like it's the sort of thing that most likely will work, but you shouldn't depend on it. I will make a final update to the answer.Chirurgeon
found a related discussion here: https://mcmap.net/q/1922115/-filling-an-array-concurrently in the commentsComposition
Added a (final) update based on our discussion and further findings.Chirurgeon
R
-1

What about NSLock solution? Is NSLock lock/unlock faster?

private static let locker = NSLock()
.
.
.
let width            = inputCGImage.width
let height           = inputCGImage.height
let all = width * height
var pixels = [PixelationPixel]()
pixels.reserveCapacity(all)
DispatchQueue.concurrentPerform(iterations: height) { row in
    DispatchQueue.concurrentPerform(iterations: width) { column in
        let offset = row * width + column
        let c = pixelBuffer[offset].convertToUIColor()
        locker.withLock {
            pixels.append(PixelationPixel(
                color: c,
                offset: offset
            ))
        }
    }
}
Rentier answered 6/9 at 3:23 Comment(1)
Ok, why downote?Rentier

© 2022 - 2024 — McMap. All rights reserved.