Destruction of static class members in Thread local storage

Asked 4/1, 2011 at 9:12 Answered 31/1, 2016 at 16:46

I'm writing a fast multi-thread program, and I want to avoid syncronization (the function which would need to be syncronized must be called something like 5,000,000 times per second, so even a mutex would be too heavy).

The scenario is: I have a single global instance of a class, and each thread can access it. In order to avoid syncronization, all the data inside the class is accessed read-only, except for a bunch of class members, which are then declared in TLS (with __thread or __declspec(thread)).

Unfortunately, in order to use the __thread interface offered by the compiler, the class members have to be static and without constructors/deconstructors. The classes I use of course have custom constructors, so I'm declaring, as class members, a pointer to that classes (something like static __thread MyClass* _object).

Then, the first time a thread calls a method from the global instance, I'll do something like "(if _object == NULL) object = new MyClass(...)".

My biggest problem is: is there a smart way to free this allocated memory? This global class is from a library, and it is used by many threads in the program, and each thread is created in a different way (i.e. each thread executes a different function) and I can't put a snipplet of code each time the thread is going to terminate. Thank you guys.

Wanonah answered 4/1, 2011 at 9:12 Comment(7)

first impression is that you need a design review for the system. what data need to be shared and which data will be modified, and when does it happen. – Rayraya 4/1, 2011 at 9:27

Unfortuntely I'm not designing the system from scratch, but IMHO this is the least perturbative way (in terms of "no interfaces changed") in order to achieve my results – Wanonah 4/1, 2011 at 9:32

have you tried at_exit ? (dunno if it works when unloading a lib) – Betsybetta 4/1, 2011 at 9:44

Ehehe, that would be too easy :-) My program is a kind of daemon (portable Win32/Linux), so it is basically never killed, but threads are continually created/destroyed, and each thread creation means basically a huge memory leak. – Wanonah 4/1, 2011 at 9:49

The C++11 thread_local seems to be what you are looking for, except that gcc-4.8 is the only compiler I know that implements it. – Tnt 31/1, 2013 at 22:41

@MarcGlisse Hmm, I have gcc-4.8.2 on Ubuntu Linux, and it doesn't seem to be working. – Greenwald 30/1, 2015 at 4:18

@MarcGlisse Ah, never mind, it does, with --enable-tls. – Greenwald 30/1, 2015 at 5:22

In C++11 this is easily achieved:

static thread_local struct TlsCleaner {
    ~TlsCleaner() {
        cleanup_tls();
    }
} tls_cleaner;

cleanup_tls() will execute on every thread termination (provided the thread is created using C++ API like std::thread).

But then, you could just as well cleanup TLS objects directly in their destructors (which will also promptly execute). For example: static thread_local std::unique_ptr<MyClass> pMyClass; will delete MyClass when a thread terminates.

Before C++11 you can use hacks like the GNU "linker sets" or MSVC "_tls_used" callback.

Or, starting from Windows 6 (Vista), FlsAlloc, which accepts a cleanup callback.

Enlargement answered 31/1, 2016 at 16:46 Comment(0)

TLS clean-up is usually done in DllMain when it is passed DLL_THREAD_DETACH.

If your code is all in an EXE and not a DLL then you could create a dummy DLL that the EXE loads which in turn calls back into the EXE on DLL_THREAD_DETACH. (I don't know of a better way to have EXE code run on thread termination.)

There are a couple of ways for the DLL to call back into the EXE: One is that EXEs can export functions just like DLLs, and the DLL code can use GetProcAddress on the EXE's module handle. An easier method is to give the DLL an init function which the EXE calls to explicitly pass a function pointer.

Note that what you can do within DllMain is limited (and unfortunately the limits are not properly documented), so you should minimize any work done this way. Don't run any complex destructors; just free memory using a direct kernel32.dll API like HeapAlloc and free the TLS slot.

Also note that you won't get a DLL_THREAD_ATTACH for threads that were already running when your DLL was loaded (but you will still get DLL_THREAD_DETACH if they exit while the DLL is loaded), and that you'll get (only) a DLL_PROCESS_DETACH when the final thread exits.

Preliminary answered 4/1, 2011 at 9:34 Comment(4)

That's exactly what I was trying to avoid. My program is completely portable (it compiles on VC++ and GCC) and the class containing the TLS data is compiled as a static library, which then gets linked into the main exe (the one which will spawn threads). Anyway, if I don't get any other interesting answer, I'll end up by writing a portable "thread detach" handler. – Wanonah 4/1, 2011 at 9:39

@Gianluca, Using TLS at all seems inherently non-portable to me. The syntax and behaviour of TLS differ by compiler and in some situations also by OS version (even if we're only talking about Windows versions). Makes sense to use TLS to avoid re-factoring a lot of code (I've been there, especially when your code is called via callbacks from a 3rd party library!) but I think you may well have to write some kind of abstraction/wrapper to handle the compiler/platform differences (or find an existing one; maybe Boost has some TLS stuff?). – Preliminary 4/1, 2011 at 10:27

actually boost thread does have the concept of TLS and you can provide a cleanup function which will get triggered on thread exit - of course you have to use boost threads to use that functionality - depends on what the OP is using for threading... – Emergent 4/1, 2011 at 10:38

You're definitely right. I'm currently using a simple wrapper (ifdef-based) to handle the compiler syntaxes, and I guess I should write a portable thread destruction handler as well. Unfortunately, I took a look at some portable implementation (Poco and Boost), but they seem a little bit too slow for my needs and, furthermore, I don't want to use their thread interface (for the same reason you said: I can't refactor thousands lines of code). – Wanonah 4/1, 2011 at 10:40

If you just want a generic cleanup function you can still use boost thread_specific_ptr. You don't need to actually use the data stored there, but you can take advantage of the custom cleanup function. Just make that function something arbitrary and you can do whatever you want. Look at the pthread function pthread_key_create for a direct pthreads function call.

There is unfortunately no easy answer, at least not that I've come across yet. That is, there is no common way to have complex objects deleted at thread exit time. However, there's nothing stopping you from doing this on your own.

You will need to register your own handlers at thread exit time. With pthreads that would be pthread_cleanup_push. I don't know what it is on windows. This is of course not cross-platform. But, presumably you have full control of the starting of the thread and its entry-point. You could simply explicitly call a cleanup function just before returning from your thread. I know you mentioned you can't add this snippet, in which case you'll be left calling the OS specific function to add a cleanup routine.

Obviously creating cleanup functions for all objects allocated could be annoying. So instead you should create one more thread local variable: a list of destructors for objects. For each thread-specific variable you create you'll push a destructor onto this list. This list will have to be created on demand if you don't have a common thread entry point: have a global function to call which takes your destructor and creates the list as necessary, then adds the destructor.

Exactly what this destructor looks like depends heavily on your object hierarchy (you may have simple boost bind statements, shared_ptr's, a virtual destructor in a base class, or a combination thereof).

Your generic cleanup function can then walk through this list and perform all the destructors.

Prop answered 4/1, 2011 at 10:39 Comment(3)

Check out the answer for Boost Thread Local, unless you have heavy performance requirements it might also work. – Prop 4/1, 2011 at 12:13

tried to comment this already, if you have heavy performance requirements you cannot spawn new threads frequently, this kills performance. and if you don't spawn new threads frequently, cost of TSL initialization (done once per thread) is completely negligible – Abstinence 24/1, 2011 at 21:29

The use of thread_specific_ptr is higher than the native thread local specifier. – Prop 25/1, 2011 at 0:55

If you are using pthreads you could look at the cleanup operations?

http://man.yolinux.com/cgi-bin/man2html?cgi_command=pthread_cleanup_push

You can push a cleanup operation just after the creation of the thread local object, such that on exit that object gets destroyed. Not sure what the winapi equivalent is...

Emergent answered 4/1, 2011 at 10:33 Comment(0)

Boost Thread Local Storage

Abstinence answered 4/1, 2011 at 10:46 Comment(2)

I've had a look at this and it seems okay for some general cases. Huge advantage is a nice OOP abstraction for thread-local data structures. Disadvantage is that it will not take advantage of link-time thread-local storage, thus performance will not be optimal on platforms that support it. – Prop 4/1, 2011 at 12:11

do you mean TLS performance improvement on frequently created threads? in all other cases TLS will be created and initialized once. and the case of frequently created threads should be improved by thread pool – Abstinence 4/1, 2011 at 12:34

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags