Maximum number of threads with async-await task groups
Asked Answered
K

1

21

My intent is to understand the “cooperative thread pool” used by Swift 5.5’s async-await, and how task groups automatically constrain the degree of concurrency: Consider the following task group code, doing 32 calculations in parallel:

func launchTasks() async {
    await withTaskGroup(of: Void.self) { group in
        for i in 0 ..< 32 {
            group.addTask { [self] in
                let value = doSomething(with: i)
                // do something with `value`
            }
        }
    }
}

While I hoped it would constrain the degree of concurrency, as advertised, I'm only getting two (!) concurrent tasks at a time. That is far more constrained than I would have expected:

enter image description here

If I use the old GCD concurrentPerform ...

func launchTasks2() {
    DispatchQueue.global().async {
        DispatchQueue.concurrentPerform(iterations: 32) { [self] i in
            let value = doSomething(with: i)
            // do something with `value`
        }
    }
}

... I get twelve at a time, taking full advantage of the device (iOS 15 simulator on my 6-core i9 MacBook Pro) while avoiding thread-explosion:

enter image description here

(FWIW, both of these were profiled in Xcode 13.0 beta 1 (13A5154h) running on Big Sur. And please disregard the minor differences in the individual “jobs” in these two runs, as the function in question is just spinning for a random duration; the key observation is the degree of concurrency is what we would have expected.)

It is excellent that this new async-await (and task groups) automatically limits the degree of parallelism, but the cooperative thread pool of async-await is far more constrained than I would have expected. And I see of no way to adjust these parameters of that pool. How can we better take advantage of our hardware while still avoiding thread explosion (without resorting to old techniques like non-zero semaphores or operation queues)?

Kamerman answered 14/6, 2021 at 22:50 Comment(0)
K
18

This limitation on the cooperative thread pool on the simulator has been removed in Xcode 14.3 (without any mention to this change in the release notes).


It looks like this curious behavior is a limitation of the simulator in Xcode 14.2 and earlier. If I run it on my physical iPhone 12 Pro Max, the async-await task group approach results in 6 concurrent tasks ...

enter image description here

... which is essentially the same as the concurrentPerform behavior:

enter image description here

The behavior, including the degree of concurrency, is essentially the same on the physical device.

One is left to infer that the simulator appears to be configured to constrain async-await more than what is achievable with direct GCD calls. But on actual physical devices, the async-await task group behavior is as one would expect.


For what it is worth, the above was produced by Xcode 13 on MacBook Pro. I have repeated this two different Macs in Xcode 14.2 and got different results. Specifically, on my Intel 2018 MacBook Pro, the cooperative thread pool for my simulator had two threads. On my 2022 Mac Studio’s simulator, though, it was constrained to 3 threads:

enter image description here

It would appear that the size of the simulator’s cooperative thread pool is affected by the Mac hardware you use. But the point remains, that the cooperative thread pool is artificially constrained on the simulator.


For comparison, here is a comparable “Points of Interest” run on a physical iPhone 12 Pro Max in Xcode 14.2:

enter image description here

Kamerman answered 14/6, 2021 at 22:50 Comment(9)
As I understand correctly, the new cooperative thread pool, unlike the previous one, does not overcommit the system (no more than one thread of each core is running). This allows us to get rid of idle threads and unnecessary context switches. But from your screenshot of concurrentPerform-solution it looks like the GCD optimized in exactly the same way. Is it so?Linked
Yes, both concurrentPerform and the new cooperative thread pool approaches constrain the number of threads, avoiding thread explosion. The main benefit, IMHO, is that this avoids deadlocks that can arise when thread explosion exhausts GCD’s very limited pool of worker threads.Kamerman
@Kamerman How does the number of threads used compare to the number of available [Simulated] cores reported via processorCount/activeProcessorCount? That may differ between the two systems.Pergolesi
The concurrentPerform number of current threads is the same as processorCount/activeProcessorCount result, but, on the simulator, the cooperative pool is not. But on physical device, it is.Kamerman
due to the 2 thread limit on simulators does this imply that things could be assumed slower on Simulator than what they'll actually be on newer devices?Onida
There are a ton of different variables that contribute to speed differences in the two environments, so I would hesitate to make such a broad statement, but, yes, code that makes full utilization of the cooperative thread pool will be able to enjoy a higher degree of concurrency on physical devices than on the simulator. When benchmarking performance, there is no replacement for testing on a physical device.Kamerman
As an aside, the temporary use of the LIBDISPATCH_COOPERATIVE_POOL_STRICT=1 can also adversely affect the concurrency. It may be useful to use this environment variable for diagnostic purposes, but understand that it may affect/change parallel execution.Kamerman
Interesting findings Rob, thanks for asking/answering this! From these findings, it looks task groups only run as many tasks as you have cores. For an IO bound task (e.g. downloading lots of small network requests), is there a way to increase the concurrency beyond the core count? Or should I stick to DispatchQueue.concurrentPerform for that?Contrapuntist
@Contrapuntist - Yes, the Swift concurrency cooperative thread pool is constrained to the number of cores, as is the old GCD concurrentPerform. There is really no benefit to going back to that old GCD API, and there are lots of reasons to not do so. That having been said, actors are reentrant and network requests are not constrained by threads at all. Frankly, we often have the opposite problem, that we deliberately want to constrain network requests to 4-8 at a time, to reduce peak memory usage and minimize server impact. Doing more has negligible performance benefit, but introduces problems.Kamerman

© 2022 - 2024 — McMap. All rights reserved.