What does the thread_local mean in C++11?

Asked 16/8, 2012 at 9:5 Answered 16/8, 2012 at 10:41

Solved c++multithreading c++11 thread-local thread-local-storage

190

I am confused with the description of thread_local in C++11. My understanding is, each thread has unique copy of local variables in a function. The global/static variables can be accessed by all the threads (possibly synchronized access using locks). And the thread_local variables are visible to all the threads but can only modified by the thread for which they are defined? Is it correct?

Chopstick answered 16/8, 2012 at 9:5 Comment(1)

Related question: #22794882 – Daria 25/4, 2023 at 19:43

231

Thread-local storage duration is a term used to refer to data that is seemingly global or static storage duration (from the viewpoint of the functions using it) but, in actual fact, there is one copy per thread.

It adds to the current options:

automatic (exists during a block or function);
static (exists for the program duration); and
dynamic (exists on the heap between allocation and deallocation).

Something that is thread-local is brought into existence at thread creation time and disposed of when the thread finishes.

For example, think of a random number generator where the seed must be maintained on a per-thread basis. Using a thread-local seed means that each thread gets its own random number sequence, independent of all other threads.

If your seed was a local variable within the random function, it would be initialised every time you called it, giving you the same number each time. If it was a global, threads would interfere with each other's sequences.

Another example is something like strtok where the tokenisation state is stored on a thread-specific basis. That way, a single thread can be sure that other threads won't screw up its tokenisation efforts, while still being able to maintain state over multiple calls to strtok - this basically renders strtok_r (the thread-safe version) redundant.

Yet another example would be something like errno. You don't want separate threads modifying errno after one of your calls fails, but before you've had a chance to check the result.

This page has a reasonable description of the different storage duration specifiers.

Vladimir answered 16/8, 2012 at 9:13 Comment(9)

Using thread local doesn't solve the problems with strtok. strtok is broken even in a single threaded environment. – Grumous 16/8, 2012 at 10:15

Sorry, let me rephrase that. It doesn't introduce any new problems with strtok :-) – Vladimir 16/8, 2012 at 10:16

Actually, the r stands for "re-entrant", which has nothing to do with thread safety. It's true that you can make some things work thread-safely with thread-local storage, but you can't make them re-entrant. – Ungual 16/8, 2012 at 10:18

In a single-threaded environment, functions need to be re-entrant only if they are part of a cycle in the call graph. A leaf function (one that doesn't call other functions) is by definition not part of a cycle, and there is no good reason why strtok should call other functions. – Unaccompanied 16/8, 2012 at 12:39

this would mess it up: while (something) { char *next = strtok(whatever); someFunction(next); // someFunction calls strtok } – Tiptop 25/6, 2014 at 20:18

@MSalters: You get problems if you (try to) intertwine two strtok sequences in one thread; say, if you're processing two strings at the same time. That's where the reentrant variants come in handy (plus it's cleaner --- no globals are accessed). – Greave 11/2, 2015 at 17:47

Does a thread_local object calls its deallocator at the end of the thread ? – Damar 27/1, 2017 at 13:33

+1 Great example for strtok. I checked glibc from the tip, the implementation of strtok is by two lines and calls strtok_r. – Calder 15/10, 2019 at 22:26

Some code samples would be nice – Bespangle 8/11, 2021 at 13:42

189

When you declare a variable thread_local then each thread has its own copy. When you refer to it by name, then the copy associated with the current thread is used. e.g.

#include <iostream>
#include <thread>

thread_local int i = 0;

void setNewVal(int newval)
{
    i = newval;
}

void print()
{
    std::cout << i;
}

void thread_func(int id)
{
    setNewVal(id);
    ++i;
    print();
}

int main()
{
    i = 9;

    std::thread t1(thread_func, 1);
    std::thread t2(thread_func, 2);
    std::thread t3(thread_func, 3);

    t1.join();
    t2.join();
    t3.join();

    std::cout << i << std::endl;
}

This code will output "2349", "3249", "4239", "4329", "2439" or "3429", but never anything else. Each thread has its own copy of i, which is assigned to, incremented and then printed. The thread running main also has its own copy, which is assigned to at the beginning and then left unchanged. These copies are entirely independent, and each has a different address.

It is only the name that is special in that respect --- if you take the address of a thread_local variable then you just have a normal pointer to a normal object, which you can freely pass between threads. e.g.

thread_local int i = 0;

void thread_func(int *ptr)
{
    *ptr = 42;
}

int main()
{
    i = 9;
    std::thread t(thread_func, &i);
    t.join();
    std::cout << i << std::endl;
}

Since the address of i is passed to the thread function, then the copy of i belonging to the main thread can be assigned to even though it is thread_local. This program will thus output "42". If you do this, then you need to take care that *p is not accessed after the thread it belongs to has exited, otherwise you get a dangling pointer and undefined behavior just like any other case where the pointed-to object is destroyed.

thread_local variables are initialized "before first use", so if they are never touched by a given thread then they are not necessarily ever initialized. This is to allow compilers to avoid constructing every thread_local variable in the program for a thread that is entirely self-contained and doesn't touch any of them. e.g.

struct my_class
{
    my_class() { std::cout << "hello"; }
    ~my_class() { std::cout << "goodbye"; }
};

void f()
{
    thread_local my_class unused;
}

void do_nothing() { }

int main()
{
    std::thread t1(do_nothing);
    t1.join();
}

In this program there are 2 threads: the main thread and the manually-created thread. Neither thread calls f, so the thread_local object is never used. It is therefore unspecified whether the compiler will construct 0, 1 or 2 instances of my_class, and the output may be "", "hellohellogoodbyegoodbye" or "hellogoodbye".

Credo answered 16/8, 2012 at 10:41 Comment(3)

I think it is important to note that the thread-local copy of the variable is a newly initialized copy of variable. That is, if you add a g() call to the beginning of threadFunc, then the output will be 0304029 or some other permutation of the pairs 02, 03, and 04. That is, even though 9 is assigned to i before the threads are created, the threads get a freshly constructed copy of i where i=0. If i is assigned with thread_local int i = random_integer(), then each thread gets a new random integer. – Sateen 11/6, 2017 at 23:21

Not exactly a permutation of 02, 03, 04, there may be other sequences like 020043 – Interactive 17/9, 2018 at 14:4

Interesting tidbit I just found: GCC supports using the address of a thread_local variable as template argument, but other compilers do not (as of this writing; tried clang, vstudio). I'm not sure what the standard has to say about that, or if this is a unspecified area. – Ruinous 11/7, 2020 at 17:12

Thread-local storage is in every aspect like static (= global) storage, only that each thread has a separate copy of the object. The object's life time starts either at thread start (for global variables) or at first initialization (for block-local statics), and ends when the thread ends (i.e. when join() is called).

Consequently, only variables that could also be declared static may be declared as thread_local, i.e. global variables (more precisely: variables "at namespace scope"), static class members, and block-static variables (in which case static is implied).

As an example, suppose you have a thread pool and want to know how well your work load was being balanced:

thread_local Counter c;

void do_work()
{
    c.increment();
    // ...
}

int main()
{
    std::thread t(do_work);   // your thread-pool would go here
    t.join();
}

This would print thread usage statistics, e.g. with an implementation like this:

struct Counter
{
     unsigned int c = 0;
     void increment() { ++c; }
     ~Counter()
     {
         std::cout << "Thread #" << std::this_thread::id() << " was called "
                   << c << " times" << std::endl;
     }
};

Ungual answered 16/8, 2012 at 9:23 Comment(1)

Did you not mean std::this_thread::get_id() in your std::cout? – Viradis 7/7, 2023 at 9:55

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags