I have a simple question, can C++11 thread_local
be used with other parallel models.
For example, can I use it within a function while using OpenMP or Intel TBB to parallel the tasks.
Most such parallel programming models hide hardware threads behind higher level API. My instinct is that they all have to map their task schedulers into hardware threads. Can I expect that C++11 thread_local
will have expected effect.
A simple example is,
void func ()
{
static thread_local some_var = init_val;
#pragma omp parallel for [... clauses ...]
for (int i = 0; i < N; ++i) {
// access some_var somewhere within the loop
}
}
Can I expect that each OpenMP thread will access its own copy of some_var
?
I know that most parallel programming models have their own constructs for thread-local storage. However, having the ability to use C++11 thread_local
(or compiler specific keyword) is nice. For example, consider the situation
// actually may implemented with a class with operator()
void func ()
{
static thread_local some_var;
// a quite complex function
}
void func_omp (int N)
{
#pragma omp for [... clauses ...]
for (int i = 0; i < N; ++i)
func();
}
void func_tbb (int N)
{
tbb::parallel_for(tbb::blocked_range<int>(0, N), func);
}
void func_select (int N)
{
// At runtime or at compile time, based which programming model is available,
// select to run func_omp or func_tbb
}
The basic idea here is that func
may be quite complex. I want to support multiple parallel programming models. If I use parallel programming specific thread-local constructs, then I have implement different versions of func
or at least partial of it. However, if I can freely use C++11 thread_local
, then in addition to func
I only need to implement a few very simple functions. And for a larger project things can be further simplified by using templates to write more generic versions of func_omp
, func_tbb
. However, I am not quite sure it is safe to do so.