In the Intel thread building blocks framework, how does one ensure that all threads are not busy waiting for other threads to complete.
Consider for example the following code,
#include <tbb/tbb.h>
#include <vector>
#include <cstdlib>
#include <future>
#include <iostream>
std::future<bool> run_something(std::function<bool(bool)> func, bool b) {
auto task = std::make_shared<std::packaged_task<bool()> >(std::bind(func, b));
std::future<bool> res = task->get_future();
tbb::task_group g;
g.run([task]() { (*task)(); });
return res;
};
int main() {
tbb::parallel_for(0, 100, 1, [=](size_t i) {
g.run([] () {
std::cout << "A" << std::endl;
run_something([] (bool b) { return b; }, true).get();
});
});
return EXIT_SUCCESS;
}
Here the main
function spawns as tasks as there are threads in the thread pool used by the TBB library. Then when the second call to spawn more tasks happens in the run_something
function, the TBB scheduler sees that no threads are available and simply deadlocks. That is I see that that print statement goes through exactly 4 times on a 4 hyper thread machine and 8 times on a 8 hyper thread machine.
How do I avoid this scenario, in particular, is there a way to ensure that two task_group
or task_arena
or parallel_for
constructs use two completely disjoint set of threads?