What is a python thread
Asked Answered
G

4

46

I have several questions regarding Python threads.

  1. Is a Python thread a Python or OS implementation?
  2. When I use htop a multi-threaded script has multiple entries - the same memory consumption, the same command but a different PID. Does this mean that a [Python] thread is actually a special kind of process? (I know there is a setting in htop to show these threads as one process - Hide userland threads)
  3. Documentation says:

A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left.

My interpretation/understanding was: main thread terminates when all non-daemon threads are terminated.

So python daemon threads are not part of Python program if "the entire Python program exits when only daemon threads are left"?

Gaffney answered 24/12, 2011 at 8:37 Comment(3)
You mean threading.Thread right?Hithermost
Yes, I do. Are there other threads in standard Python?Gaffney
Yes, the thread module provide another interface to native threads (but hey use the same native implementation).Fran
F
41
  1. Python threads are implemented using OS threads in all implementations I know (C Python, PyPy and Jython). For each Python thread, there is an underlying OS thread.

  2. Some operating systems (Linux being one of them) show all different threads launched by the same executable in the list of all running processes. This is an implementation detail of the OS, not of Python. On some other operating systems, you may not see those threads when listing all the processes.

  3. The process will terminate when the last non-daemon thread finishes. At that point, all the daemon threads will be terminated. So, those threads are part of your process, but are not preventing it from terminating (while a regular thread will prevent it). That is implemented in pure Python. A process terminates when the system _exit function is called (it will kill all threads), and when the main thread terminates (or sys.exit is called), the Python interpreter checks if there is another non-daemon thread running. If there is none, then it calls _exit, otherwise it waits for the non-daemon threads to finish.


The daemon thread flag is implemented in pure Python by the threading module. When the module is loaded, a Thread object is created to represent the main thread, and it's _exitfunc method is registered as an atexit hook.

The code of this function is:

class _MainThread(Thread):

    def _exitfunc(self):
        self._Thread__stop()
        t = _pickSomeNonDaemonThread()
        if t:
            if __debug__:
                self._note("%s: waiting for other threads", self)
        while t:
            t.join()
            t = _pickSomeNonDaemonThread()
        if __debug__:
            self._note("%s: exiting", self)
        self._Thread__delete()

This function will be called by the Python interpreter when sys.exit is called, or when the main thread terminates. When the function returns, the interpreter will call the system _exit function. And the function will terminate, when there are only daemon threads running (if any).

When the _exit function is called, the OS will terminate all of the process threads, and then terminate the process. The Python runtime will not call the _exit function until all the non-daemon thread are done.

All threads are part of the process.


My interpretation/understanding was: main thread terminates when all non-daemon threads are terminated.

So python daemon threads are not part of python program if "the entire Python program exits when only daemon threads are left"?

Your understanding is incorrect. For the OS, a process is composed of many threads, all of which are equal (there is nothing special about the main thread for the OS, except that the C runtime add a call to _exit at the end of the main function). And the OS doesn't know about daemon threads. This is purely a Python concept.

The Python interpreter uses native thread to implement Python thread, but has to remember the list of threads created. And using its atexit hook, it ensures that the _exit function returns to the OS only when the last non-daemon thread terminates. When using "the entire Python program", the documentation refers to the whole process.


The following program can help understand the difference between daemon thread and regular thread:

import sys
import time
import threading

class WorkerThread(threading.Thread):

    def run(self):
        while True:
            print 'Working hard'
            time.sleep(0.5)

def main(args):
    use_daemon = False
    for arg in args:
        if arg == '--use_daemon':
            use_daemon = True
    worker = WorkerThread()
    worker.setDaemon(use_daemon)
    worker.start()
    time.sleep(1)
    sys.exit(0)

if __name__ == '__main__':
    main(sys.argv[1:])

If you execute this program with the '--use_daemon', you will see that the program will only print a small number of Working hard lines. Without this flag, the program will not terminate even when the main thread finishes, and the program will print Working hard lines until it is killed.

Fran answered 24/12, 2011 at 8:57 Comment(9)
>The process will terminate when the last non-daemon thread finish.< So, daemon threads are not part of the python application process? I thought the only difference between a daemon and non-daemon thread is just a flag, which determines how the thread is treated, as it's mentioned in the docs: > The significance of this flag is that the entire Python program exits when only daemon threads are left. < What's the 'entire Python program' here? I thought it is the process. But how the process can be terminated when it still has threads?Gaffney
Updated my answer to explain how daemon thread are implemented in Python.Fran
Thanks for your patience. > When the _exit function is called, the OS will terminate all of the process threads, and then terminate the process. < I don't understand: you say after _exitfunc is called all the threads, including the daemon ones, will be terminated be OS? That's not what i see - daemon threads are still running after main thread is terminated. One of my questions is this: how can one say that the Python entire program (process) exits of there still threads (daemon)Gaffney
Thank you for your update (though i didn't get a notification about it. I guess now i finally understand. The main confusion for me was the 'daemon' name, which from other contexts told me something is running detached (like in daemonized python script, which runs in background). I thought daemon threads are running 'detached' from other threads and continue running when non-daemon threads exit.Gaffney
From your answer and examples i would say it like this: > A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left (i.e. daemon threads are killed forcibly). < Is this correct understanding?Gaffney
@Gaffney Yes, that's it. The daemon flag is just a flag saying that the thread is not critical and can be killed unceremoniously.Fran
I tried to catch the killing - put some print in __del__ - but nothing was printed. It's killed by the OS?Gaffney
Yes, it is killed by the OS. The __del__ will not be printed. As I said, when a process call the system _exit function, the OS release all the resources allocated to the process, including the thread, without calling any user space cleanup functions.Fran
I wonder why there's such a thing as sys.setswitchinterval(), whose documentation says: "Set the interpreter’s thread switch interval (in seconds). This floating-point value determines the ideal duration of the “timeslices” allocated to concurrently running Python threads." Is it not the prerogative of the OS to decide how much time-slice to provide to each thread? What exactly is the role of the Python interpreter in thread-switching? Sorry if all this is tangential to original question.Combe
V
16

I'm not familiar with the implementation, so let's make an experiment:

import threading
import time

def target():
    while True:
        print 'Thread working...'
        time.sleep(5)

NUM_THREADS = 5

for i in range(NUM_THREADS):
    thread = threading.Thread(target=target)
    thread.start()
  1. The number of threads reported using ps -o cmd,nlwp <pid> is NUM_THREADS+1 (one more for the main thread), so as long as the OS tools detect the number of threads, they should be OS threads. I tried both with cpython and jython and, despite in jython there are some other threads running, for each extra thread that I add, ps increments the thread count by one.

  2. I'm not sure about htop behaviour, but ps seems to be consistent.

  3. I added the following line before starting the threads:

    thread.daemon = True
    

    When I executed the using cpython, the program terminated almost immediately and no process was found using ps, so my guess is that the program terminated together with the threads. In jython the program worked the same way (it didn't terminate), so maybe there are some other threads from the jvm that prevent the program from terminating or daemon threads aren't supported.

Note: I used Ubuntu 11.10 with python 2.7.2+ and jython 2.2.1 on java1.6.0_23

Vanderbilt answered 24/12, 2011 at 9:10 Comment(0)
C
5
  1. Python threads are practically an interpreter implementation, because the so called global interpreter lock (GIL), even if it's technically using the os-level threading mechanisms. On *nix it's utilizing the pthreads, but the GIL effectivly makes it a hybrid stucked to the application-level threading paradigm. So you will see it on *nix systems multiple times in a ps/top output, but it still behaves (performance-wise) like a software-implemented thread.

  2. No, you are just seeing the kind of underlying thread implementation of your os. This kind of behaviur is exposed by *nix pthread-like threading or im told even windows does implement threads this way.

  3. When your program closes, it waits for all threads to finish also. If you have threads, which could postpone the exit indefinitly, it may be wise to flag those threads as "daemons" and allow your program to finish even if those threads are still running.

Some reference material you might be interested:

Couscous answered 24/12, 2011 at 8:58 Comment(7)
1. This is wrong. Python threads are OS threads. The GIL does affect performance on multi-core systems but for some situations you can still get some benefit from multiple cores (specifically when the thread spends much of its time cpu bound in a C library function).Melissa
We could argue about that. Fact is: (c)Python is using the underlying threading mechanism soley out of convenience (Why reimplement somethin, thats already proven good?), but denies you the goodies of "real" os-threads. So a) yes they are technicaly an os-level thread, but b) NO they are not an os-level thread, because you gain none of the benefits besides the shared memory of an os-level thread. It's even so i would recommend to NOT use them if you have a parrallelism in mind to boost your performance.Couscous
There was once a talk, which showed that it even may slow you considerably down, if starting to many threads (in some rare conditions even with external functions), because of the additional layer for context-switching. walks, swims and quacks like a duck, even if it wasn't born as a duck! ;-)Couscous
I agree. At this point, due to my frustration with GIL and threading, I likely start using the multiprocessing module more often as it bypasses the GIL limitations. Something I recommend anyone who is looking at multithreading for a performance boost to consider.Designing
@DonQuestion yes, using too many threads the context switching and need for locks will slow down your application. That is true for any threading system and any programming language it is not something unique to Python.Melissa
@Duncan: I just can recommend to take a look at David Beazleys excellent talk. I tried using the threading module out of naivety and was devastated by the illogical, inconsistent performance results. And THOSE were clearly caused by python. You just don't expect to get a WORSE performance if utilizing just ONE addditional thread on a multicore, multiprocessor system. That's without a doubt, proven fact pythons fault. As it is python threads aren't meant to boost performance in the first place, but leverage utility. And thats quite ok!Couscous
PyCon 2010:Understanding the Python GIL - a really nice talk - just watched it - it has threads explanation.Gaffney
D
0

There are great answers to the question, but I feel the daemon threads question is still not explained in a simple fashion. So this answer refers just to the third question

"main thread terminates when all non-daemon threads are terminated."

So python daemon threads are not part of Python program if "the entire Python program exits when only daemon threads are left"?

If you think about what a daemon is, it is usually a service. Some code that runs in an infinite loop, that serves request, fill queues, accepts connections, etc. Other threads use it. It has no purpose when running by itself (in a single process terms).

So the program can't wait for the daemon thread to terminate, because it might never happen. Python will end the program when all non daemon threads are done. It also stops the daemon threads.

To wait until a daemon thread has completed its work, use the join() method. daemon_thread.join() will make Python to wait for the daemon thread as well before exiting. The join() also accepts a timeout argument.

Dolomite answered 14/10, 2017 at 8:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.