Thread local storage in Python
Asked Answered
C

5

93

How do I use thread local storage in Python?

Related

Crwth answered 10/9, 2009 at 23:3 Comment(3)
I'm not sure what you're asking--threading.local is documented, and you've more or less pasted the documentation below...Asteria
@Glenn I pasted the documentation in one of my answers. I quoted Alex's solution in the other. I am simply making this content more accessible.Crwth
Imagine criticizing helpful volunteers for reformatting critical documentation as a mobile-accessible StackOverflow answer previously readable only by manually typing obfuscatory Python statements into an interactive CLI REPL (e.g., import _threading_local as tl\nhelp(tl)). </yikes>Melodiemelodion
C
154

Thread local storage is useful for instance if you have a thread worker pool and each thread needs access to its own resource, like a network or database connection. Note that the threading module uses the regular concept of threads (which have access to the process global data), but these are not too useful due to the global interpreter lock. The different multiprocessing module creates a new sub-process for each, so any global will be thread local.

threading module

Here is a simple example:

import threading
from threading import current_thread

threadLocal = threading.local()

def hi():
    initialized = getattr(threadLocal, 'initialized', None)
    if initialized is None:
        print("Nice to meet you", current_thread().name)
        threadLocal.initialized = True
    else:
        print("Welcome back", current_thread().name)

hi(); hi()

This will print out:

Nice to meet you MainThread
Welcome back MainThread

One important thing that is easily overlooked: a threading.local() object only needs to be created once, not once per thread nor once per function call. The global or class level are ideal locations.

Here is why: threading.local() actually creates a new instance each time it is called (just like any factory or class call would), so calling threading.local() multiple times constantly overwrites the original object, which in all likelihood is not what one wants. When any thread accesses an existing threadLocal variable (or whatever it is called), it gets its own private view of that variable.

This won't work as intended:

import threading
from threading import current_thread

def wont_work():
    threadLocal = threading.local() #oops, this creates a new dict each time!
    initialized = getattr(threadLocal, 'initialized', None)
    if initialized is None:
        print("First time for", current_thread().name)
        threadLocal.initialized = True
    else:
        print("Welcome back", current_thread().name)

wont_work(); wont_work()

Will result in this output:

First time for MainThread
First time for MainThread

multiprocessing module

All global variables are thread local, since the multiprocessing module creates a new process for each thread.

Consider this example, where the processed counter is an example of thread local storage:

from multiprocessing import Pool
from random import random
from time import sleep
import os

processed=0

def f(x):
    sleep(random())
    global processed
    processed += 1
    print("Processed by %s: %s" % (os.getpid(), processed))
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)
    print(pool.map(f, range(10)))

It will output something like this:

Processed by 7636: 1
Processed by 9144: 1
Processed by 5252: 1
Processed by 7636: 2
Processed by 6248: 1
Processed by 5252: 2
Processed by 6248: 2
Processed by 9144: 2
Processed by 7636: 3
Processed by 5252: 3
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

... of course, the thread IDs and the counts for each and order will vary from run to run.

Conservatism answered 5/11, 2012 at 20:46 Comment(6)
"Note that the threading module uses the regular concept of threads (which have access to the process global data), but these are not too useful due to the global interpreter lock. " is this serious? If I am reading this correctly, this is extremely misleading as threads are immensely useful and critical, GIL or not.Mita
The wont_work function is wrong, but not because threading.local "must be used at global scope". Rather, the code is using a local variable (the threading.local object) and expecting it to retain values across calls. This is not how local variables behave (you'd get the same issue with a plain dict).Educable
@zehelvion they are useful for running multiple functions concurrently.Mita
@Mita But processes in Python do the same thing? No, what is the difference other than sharing the same globals vs. having unique globals?Snobbish
Can you please put in bold: "One important thing that is easily overlooked: a threading.local() object only needs to be created once, not once per thread nor once per function call" :) - I thought I was getting crazy!Syncretize
I don't think the threading module and the multiprocess module is a fair comparison. with thread local storage, each thread really gets their own copy of the data. the main thread can update the value before the threads are spawned, and each thread will still start off with uninitialized variable. But with multiprocess, each process gets a copy of the parent process at the time they get spawned. So it's not really "their own". In your multiprocess example here, if the parent process updates counter to some random value, there's no way for child process to see un-initialized counter.Berylberyle
E
34

Thread-local storage can simply be thought of as a namespace (with values accessed via attribute notation). The difference is that each thread transparently gets its own set of attributes/values, so that one thread doesn't see the values from another thread.

Just like an ordinary object, you can create multiple threading.local instances in your code. They can be local variables, class or instance members, or global variables. Each one is a separate namespace.

Here's a simple example:

import threading

class Worker(threading.Thread):
    ns = threading.local()
    def run(self):
        self.ns.val = 0
        for i in range(5):
            self.ns.val += 1
            print("Thread:", self.name, "value:", self.ns.val)

w1 = Worker()
w2 = Worker()
w1.start()
w2.start()
w1.join()
w2.join()

Output:

Thread: Thread-1 value: 1
Thread: Thread-2 value: 1
Thread: Thread-1 value: 2
Thread: Thread-2 value: 2
Thread: Thread-1 value: 3
Thread: Thread-2 value: 3
Thread: Thread-1 value: 4
Thread: Thread-2 value: 4
Thread: Thread-1 value: 5
Thread: Thread-2 value: 5

Note how each thread maintains its own counter, even though the ns attribute is a class member (and hence shared between the threads).

The same example could have used an instance variable or a local variable, but that wouldn't show much, as there's no sharing then (a dict would work just as well). There are cases where you'd need thread-local storage as instance variables or local variables, but they tend to be relatively rare (and pretty subtle).

Educable answered 21/4, 2015 at 19:49 Comment(4)
A global class with a class attribute -- interesting; I'll see if that also solves the issue I was having.Obtest
On the other hand, it's true that a simple global object, initialised once at program start, is often the simplest solution. It's just not the case that you need to do that - like with any variable, it depends on the application.Educable
Where I use Python professionally now, I'm not doing it for a long time. However, since ns is a class member, shouldn't we use it as Worker.ns? I'm aware that current code works, because self.ns, as a getter, gives the same result as Worker.ns, but as a best-practice that seems confusing (and in some cases could be error prone - doing self.ns = ... will not modify the class member but create a new instance level field). What do you think?Illhumored
Using the class or self is to an extent largely a matter of style, I guess. The advantage of using self is that it will work with subclassing, where hard-coding the class name won't. OTOH, it has the downside that it's possible to accidentally shadow the class variable with an instance variable, as you say.Educable
C
18

As noted in the question, Alex Martelli gives a solution here. This function allows us to use a factory function to generate a default value for each thread.

#Code originally posted by Alex Martelli
#Modified to use standard Python variable name conventions
import threading
threadlocal = threading.local()    

def threadlocal_var(varname, factory, *args, **kwargs):
  v = getattr(threadlocal, varname, None)
  if v is None:
    v = factory(*args, **kwargs)
    setattr(threadlocal, varname, v)
  return v
Crwth answered 10/9, 2009 at 23:3 Comment(6)
If you're doing this, what you really want is probably defaultdict + ThreadLocalDict, but I don't think there's a stock implementation of this. (defaultdict should really be part of dict, eg. dict(default=int), which would eliminate the need for a "ThreadLocalDefaultDict".)Asteria
@Glenn, the problem with dict(default=int) is that the dict() constructor takes in kwargs and adds them to the dict. So if that was implemented, people wouldn't be able to specify a key called 'default'. But I actually think this is a small price to pay for an implementation like you show. After all, there are other ways to add a key to a dict.Quadriplegic
@Evan - I agree that this design would be better, but it would break backwards compatibilityCrwth
@Glenn, I use this approach for plenty of thread-local variables that AREN'T defaultdicts, if that's what you mean. If you mean that this has a similar interface to what defaultdict SHOULD have (providing optional positional and named args to the factory function: EVERY time you can store a callback you SHOULD be able to optionally pass args for it!-), then, sorta, except that I typically use different factories-and-args for different varnames, AND the approach I give also works fine on Python 2.4 (don't ask...!-).Flyer
@Casebash: Shouldn't the call threadlocal = threading.local() be inside the threadlocal_var() function so it gets the local for the thread that's calling it?Liverish
Never mind. I see from this answer that it needs to be called by the main thread.Liverish
C
6

My way of doing a thread local storage across modules / files. The following has been tested in Python 3.5 -

import threading
from threading import current_thread

# fileA.py 
def functionOne:
    thread = Thread(target = fileB.functionTwo)
    thread.start()

#fileB.py
def functionTwo():
    currentThread = threading.current_thread()
    dictionary = currentThread.__dict__
    dictionary["localVar1"] = "store here"   #Thread local Storage
    fileC.function3()

#fileC.py
def function3():
    currentThread = threading.current_thread()
    dictionary = currentThread.__dict__
    print (dictionary["localVar1"])           #Access thread local Storage

In fileA, I start a thread which has a target function in another module/file.

In fileB, I set a local variable I want in that thread.

In fileC, I access the thread local variable of the current thread.

Additionally, just print 'dictionary' variable so that you can see the default values available, like kwargs, args, etc.

Cankerworm answered 15/4, 2019 at 8:7 Comment(1)
You can't access values stored in thread local storage from other threads. Your're storing in instance dict. print({k: v for t in threading.enumerate() for k, v in vars(t).items() if "local" in k}) from other threads can reveal instance data.Gamba
C
5

Can also write

import threading
mydata = threading.local()
mydata.x = 1

mydata.x will only exist in the current thread

Crwth answered 10/9, 2009 at 23:10 Comment(2)
Rather than putting this sort of code in its own answer, why not just edit your question?Quadriplegic
@Evan: Because there are two basic approaches, which are really separate answersCrwth

© 2022 - 2024 — McMap. All rights reserved.