What is the use of join() in threading?
Asked Answered
L

12

300

I was studying the python threading and came across join().

The author told that if thread is in daemon mode then i need to use join() so that thread can finish itself before main thread terminates.

but I have also seen him using t.join() even though t was not daemon

example code is this

import threading
import time
import logging

logging.basicConfig(level=logging.DEBUG,
                    format='(%(threadName)-10s) %(message)s',
                    )

def daemon():
    logging.debug('Starting')
    time.sleep(2)
    logging.debug('Exiting')

d = threading.Thread(name='daemon', target=daemon)
d.setDaemon(True)

def non_daemon():
    logging.debug('Starting')
    logging.debug('Exiting')

t = threading.Thread(name='non-daemon', target=non_daemon)

d.start()
t.start()

d.join()
t.join()

i don't know what is use of t.join() as it is not daemon and i can see no change even if i remove it

Lemaceon answered 26/2, 2013 at 9:21 Comment(2)
+1 for the title. 'Join' seems to be specially designed to encourage poor performance, (by continually creating/terminating/destroying threads), GUI lockups, (waiting in event-handlers) and app shutdown failures, (waiting for uninterruptible threads to terminate). Note - not just Python, this is a cross-language anti-pattern.Lexington
A lot of answers are just giving what .join() does. But I think the actual question is what is the point of .join() when it seems to have the same effect as running your script without threading.Cathern
P
408

A somewhat clumsy ascii-art to demonstrate the mechanism: The join() is presumably called by the main-thread. It could also be called by another thread, but would needlessly complicate the diagram.

join-calling should be placed in the track of the main-thread, but to express thread-relation and keep it as simple as possible, I choose to place it in the child-thread instead.

    without join:
    +---+---+------------------                     main-thread
        |   |
        |   +...........                            child-thread(short)
        +..................................         child-thread(long)
    
    with join
    +---+---+------------------***********+###      main-thread
        |   |                             |
        |   +...........join()            |         child-thread(short)
        +......................join()......         child-thread(long)

    with join and daemon thread
    +-+--+---+------------------***********+###     parent-thread
      |  |   |                             |
      |  |   +...........join()            |        child-thread(short)
      |  +......................join()......        child-thread(long)
      +,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,     child-thread(long + daemonized)

    '-' main-thread/parent-thread/main-program execution
    '.' child-thread execution
    '#' optional parent-thread execution after join()-blocked parent-thread could 
        continue
    '*' main-thread 'sleeping' in join-method, waiting for child-thread to finish
    ',' daemonized thread - 'ignores' lifetime of other threads;
        terminates when main-programs exits; is normally meant for 
        join-independent tasks

So the reason you don't see any changes is because your main-thread does nothing after your join. You could say join is (only) relevant for the execution-flow of the main-thread.

If, for example, you want to concurrently download a bunch of pages to concatenate them into a single large page, you may start concurrent downloads using threads, but need to wait until the last page/thread is finished before you start assembling a single page out of many. That's when you use join().

Puca answered 26/2, 2013 at 10:0 Comment(7)
Please confirm that a daemonized thread may be joined() without blocking program execution?Halinahalite
@Aviator45003: Yes, by using the timeout argument like: demon_thread.join(0.0), join() is by default blocking without regard to the daemonized attribute. But joining a demonized thread opens most likely a whole can of trouble! I'm now considering to remove the join() call in my little diagram for the daemon-thread...Puca
@DonQuestion So if we set on daemon=True don't we need to the join() if we need to the join() at end of the code?Johnnyjohnnycake
@BenyaminJafari: Yes. If not, then the main-thread(=program) would exit, if only the daemon-thread is left. But the nature of a (python) daemon thread is that the main thread doesn't care if this background task is still running. I'll think about how to elaborate on that in my answer, to clear that issue up. Thanks for your comment!Puca
In the first case, when the main thread finishes, will the program finish without letting child-thread(long) finish running itself (i.e. child-thread(long) is not completely done)?Auston
A quick doubt, if I call process inside a loop starting the threads, do I need to call join inside the loop after each start, or should I only join when I finish starting the threads, outside the loop?Zack
Outside, if you have a thread-starting loop. Otherwise you are single threaded again, because you wait for each thread to join (=finish) before starting the next one. Often this problem is solved by a thread-joining loop after a thread starting loop.Puca
V
96

Straight from the docs

join([timeout]) Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs.

This means that the main thread which spawns t and d, waits for t to finish until it finishes.

Depending on the logic your program employs, you may want to wait until a thread finishes before your main thread continues.

Also from the docs:

A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left.

A simple example, say we have this:

def non_daemon():
    time.sleep(5)
    print 'Test non-daemon'

t = threading.Thread(name='non-daemon', target=non_daemon)

t.start()

Which finishes with:

print 'Test one'
t.join()
print 'Test two'

This will output:

Test one
Test non-daemon
Test two

Here the master thread explicitly waits for the t thread to finish until it calls print the second time.

Alternatively if we had this:

print 'Test one'
print 'Test two'
t.join()

We'll get this output:

Test one
Test two
Test non-daemon

Here we do our job in the main thread and then we wait for the t thread to finish. In this case we might even remove the explicit joining t.join() and the program will implicitly wait for t to finish.

Variform answered 26/2, 2013 at 9:31 Comment(1)
Can you make some chnage to my code so that i can see the difference of t.join(). by adding soome sleep or something else. at moment i can see any chnage in program even if i use it or not. but for damemon i can see its exit if i use d.join() which i don't see when i don't use d.join()Lemaceon
P
62

Thanks for this thread -- it helped me a lot too.

I learned something about .join() today.

These threads run in parallel:

d.start()
t.start()
d.join()
t.join()

and these run sequentially (not what I wanted):

d.start()
d.join()
t.start()
t.join()

In particular, I was trying to clever and tidy:

class Kiki(threading.Thread):
    def __init__(self, time):
        super(Kiki, self).__init__()
        self.time = time
        self.start()
        self.join()

This works! But it runs sequentially. I can put the self.start() in __ init __, but not the self.join(). That has to be done after every thread has been started.

join() is what causes the main thread to wait for your thread to finish. Otherwise, your thread runs all by itself.

So one way to think of join() as a "hold" on the main thread -- it sort of de-threads your thread and executes sequentially in the main thread, before the main thread can continue. It assures that your thread is complete before the main thread moves forward. Note that this means it's ok if your thread is already finished before you call the join() -- the main thread is simply released immediately when join() is called.

In fact, it just now occurs to me that the main thread waits at d.join() until thread d finishes before it moves on to t.join().

In fact, to be very clear, consider this code:

import threading
import time

class Kiki(threading.Thread):
    def __init__(self, time):
        super(Kiki, self).__init__()
        self.time = time
        self.start()

    def run(self):
        print self.time, " seconds start!"
        for i in range(0,self.time):
            time.sleep(1)
            print "1 sec of ", self.time
        print self.time, " seconds finished!"


t1 = Kiki(3)
t2 = Kiki(2)
t3 = Kiki(1)
t1.join()
print "t1.join() finished"
t2.join()
print "t2.join() finished"
t3.join()
print "t3.join() finished"

It produces this output (note how the print statements are threaded into each other.)

$ python test_thread.py
32   seconds start! seconds start!1

 seconds start!
1 sec of  1
 1 sec of 1  seconds finished!
 21 sec of
3
1 sec of  3
1 sec of  2
2  seconds finished!
1 sec of  3
3  seconds finished!
t1.join() finished
t2.join() finished
t3.join() finished
$ 

The t1.join() is holding up the main thread. All three threads complete before the t1.join() finishes and the main thread moves on to execute the print then t2.join() then print then t3.join() then print.

Corrections welcome. I'm also new to threading.

(Note: in case you're interested, I'm writing code for a DrinkBot, and I need threading to run the ingredient pumps concurrently rather than sequentially -- less time to wait for each drink.)

Prescription answered 20/5, 2016 at 4:25 Comment(3)
Hey, I'm also new to python threading and confused about the main thread, Is the first thread is main thread, If not, please guide me?Ludewig
The main thread is the program itself. Each of the threads are forked from there. They are then joined back -- because at the command join(), the program waits until the thread is finished before it continues to execute.Prescription
I think the big question why would you want to hold up main thread when the whole point of threading is to run in parallel. I think the answer to join() is you may want to run parts of your program in parallel but you may reach a part of your main thread that requires the result of your sub thread before continuing?Cathern
C
23

The method join()

blocks the calling thread until the thread whose join() method is called is terminated.

Source : http://docs.python.org/2/library/threading.html

Chain answered 26/2, 2013 at 9:30 Comment(4)
so what is the use of join? see OP question, don't just paraphrase the docsPuca
@DonQuestion i even tried adding sleep.timer(20) in non daemon thread without using t.join() and program still waits for it before termination. i don't see any use of t.join() here in my codeLemaceon
see my answer, for further explanation. regarding your sleep.timer in non-demon -> a demon-thread is decoupled of the life-time of it's parent thread and so the parent/sibling threads won't be affected by the life-time of the demonized thread and vice versa.Puca
The 'join' and 'block' terminology is puzzling. 'Blocked' suggests the calling process is 'blocked' from doing any number of things it still has to do, while in fact it's just blocked from terminating (returning to the OS), not more. By the same token, it's not so obvious that there's a main thread calling a child thread to 'join' it (ie terminate). So, Don Q, thanks for the explanation.Bilander
E
13

With join - interpreter will wait until your process get completed or terminated

>>> from threading import Thread
>>> import time
>>> def sam():
...   print 'started'
...   time.sleep(10)
...   print 'waiting for 10sec'
... 
>>> t = Thread(target=sam)
>>> t.start()
started

>>> t.join() # with join interpreter will wait until your process get completed or terminated
done?   # this line printed after thread execution stopped i.e after 10sec
waiting for 10sec
>>> done?

without join - interpreter wont wait until process get terminated,

>>> t = Thread(target=sam)
>>> t.start()
started
>>> print 'yes done' #without join interpreter wont wait until process get terminated
yes done
>>> waiting for 10sec
Elaina answered 6/3, 2019 at 5:4 Comment(0)
J
6

This example demonstrate the .join() action:

import threading
import time

def threaded_worker():
    for r in range(10):
        print('Other: ', r)
        time.sleep(2)

thread_ = threading.Timer(1, threaded_worker)
thread_.daemon = True  # If the main thread is killed, this thread will be killed as well. 
thread_.start()

flag = True

for i in range(10):
    print('Main: ', i)
    time.sleep(2)
    if flag and i > 4:
        print(
            '''
            Threaded_worker() joined to the main thread. 
            Now we have a sequential behavior instead of concurrency.
            ''')
        thread_.join()
        flag = False

Out:

Main:  0
Other:  0
Main:  1
Other:  1
Main:  2
Other:  2
Main:  3
Other:  3
Main:  4
Other:  4
Main:  5
Other:  5

            Threaded_worker() joined to the main thread. 
            Now we have a sequential behavior instead of concurrency.
            
Other:  6
Other:  7
Other:  8
Other:  9
Main:  6
Main:  7
Main:  8
Main:  9
Johnnyjohnnycake answered 22/1, 2019 at 9:41 Comment(1)
for me, it is clear enoughtMalachy
M
5

In python 3.x join() is used to join a thread with the main thread i.e. when join() is used for a particular thread the main thread will stop executing until the execution of joined thread is complete.

#1 - Without Join():
import threading
import time
def loiter():
    print('You are loitering!')
    time.sleep(5)
    print('You are not loitering anymore!')

t1 = threading.Thread(target = loiter)
t1.start()
print('Hey, I do not want to loiter!')
'''
Output without join()--> 
You are loitering!
Hey, I do not want to loiter!
You are not loitering anymore! #After 5 seconds --> This statement will be printed

'''
#2 - With Join():
import threading
import time
def loiter():
    print('You are loitering!')
    time.sleep(5)
    print('You are not loitering anymore!')

t1 = threading.Thread(target = loiter)
t1.start()
t1.join()
print('Hey, I do not want to loiter!')

'''
Output with join() -->
You are loitering!
You are not loitering anymore! #After 5 seconds --> This statement will be printed
Hey, I do not want to loiter! 

'''
Mantellone answered 25/2, 2019 at 17:50 Comment(0)
C
2

When making join(t) function for both non-daemon thread and daemon thread, the main thread (or main process) should wait t seconds, then can go further to work on its own process. During the t seconds waiting time, both of the children threads should do what they can do, such as printing out some text. After the t seconds, if non-daemon thread still didn't finish its job, and it still can finish it after the main process finishes its job, but for daemon thread, it just missed its opportunity window. However, it will eventually die after the python program exits. Please correct me if there is something wrong.

Curst answered 9/9, 2013 at 14:39 Comment(0)
D
1

There are a few reasons for the main thread (or any other thread) to join other threads

  1. A thread may have created or holding (locking) some resources. The join-calling thread may be able to clear the resources on its behalf

  2. join() is a natural blocking call for the join-calling thread to continue after the called thread has terminated.

If a python program does not join other threads, the python interpreter will still join non-daemon threads on its behalf.

Disillusionize answered 16/1, 2020 at 8:4 Comment(0)
M
1
  • join() waits for both non-daemon and daemon threads to be completed.

  • Without join(), non-daemon threads are running and are completed with the main thread concurrently.

  • Without join(), daemon threads are running with the main thread concurrently and when the main thread is completed, the daemon threads are exited without completed if the daemon threads are still running.

So, with join() and daemon=False(daemon threads) below (daemon is False by default):

import time
from threading import Thread

def test1():
    for _ in range(3):
        print("Test1 is running...")
        time.sleep(1)
    print("Test1 is completed")
    
def test2():
    for _ in range(3):
        print("Test2 is running...")
        time.sleep(1)
    print("Test2 is completed")
                               # Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
                               # Here
thread1.start()
thread2.start()
thread1.join() # Here
thread2.join() # Here
print("Main is completed")

Or, with join() and daemon=True(non-daemon threads) below:

# ...
                               # Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
                               # Here
# ...
thread1.join() # Here
thread2.join() # Here
print("Main is completed")

join() waits for Test1 and Test2 non-daemon or daemon threads to be completed. So, Main is completed is printed after Test1 and Test2 threads are completed as shown below:

Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed
Main is completed

And, if not using join() and if daemon=False(non-daemon threads) below:

# ...
                               # Here
thread1 = Thread(target=test1, daemon=False)
thread2 = Thread(target=test2, daemon=False)
                               # Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")

Test1 and Test2 non-daemon threads are running and completed with the main thread concurrently. So, Main is completed is printed before Test1 and Test2 threads are completed as shown below:

Test1 is running...
Test2 is running...
Main is completed
Test1 is running...
Test2 is running...
Test1 is running...
Test2 is running...
Test1 is completed
Test2 is completed

And, if not using join() and if daemon=True(daemon threads) below:

# ...
                               # Here
thread1 = Thread(target=test1, daemon=True)
thread2 = Thread(target=test2, daemon=True)
                               # Here
# ...
# thread1.join()
# thread2.join()
print("Main is completed")

Test1 and Test2 daemon threads are running with the main thread concurrently. So, Main is completed is printed before Test1 and Test2 daemon threads are completed and when the main thread is completed, Test1 and Test2 daemon threads are exited without completed as shown below:

Test1 is running...
Test2 is running...
Main is completed
Marotta answered 27/10, 2022 at 7:10 Comment(0)
S
0

Looks like difference between synchronous and asynchronous processing is missunderstood here.

A thread is meant to execute a sub-procedure, most of the times on a "parallel" or "concurrent" fashion (depends on whether the device has multi-processors or not). But, what's the point on concurrency? For the most part, it's about improving performance of a process, by applying the idea of "divide and conquer". Have several threads (sub-processes) executing a "portion" of the whole process simultaneously, and then have a "final" step where all sub-processes results are combined (joined; hence the "join" method).

Of course, in order to achieve such gain on efficiency, the portions that are divided into threads, must be "mutually exclusive" (i.e., they don't share values to be updated... -- known in parallel computing as "critical section" -- ). If there is at least one value that is updated by two or more threads, then one has to wait for the other to "finish" its update, otherwise obtaining inconsistent results (i.e., two persons owning a bank account intend to withdraw certain amount of money in an ATM... if there won't be a proper mechanism that "locks" or "protects" the variable "balance" in both of the ATM devices, withdraws will completely screw-up the final value of the balance, causing obvious serious financial problem to the account owners).

So, coming back to the purpose of a thread in parallel computing: have all threads doing their individual part, and use "join" to make them "come back" to the main process so that each individual result is then "consolidated" into a global one.

Examples? A bunch of them, but let's just enumarate a few ones clearly explained:

  • Matrix multiplication: have each thread multiplying a vector of matrix A by the whole second matrix B, to obtain a vector of matrix C. At the end, have all resulting vestors put together to "display" (show) result: matrix C. In this example, although matrix B is used by all threads, no value of it is ever updated or modified (read-only).

  • Summation, product of an array of massive numbers (an array of thousand of values, whether integer or float). Make threads to execute partial sums/products (say, if you have to sum 10K values, create 5 threads, each with 2K values); then with "join" make them return to the main process and sum individual results of all 5 threads.

    Theoretically, the process will do 2000 + 5 steps (2000 simultaneously in 5 threads, plus summation of final 5 sub-totals in the main process). In practice, though, how long do the 5 threads take to do its own 2000 numbers summation is completely variable as different factors get involved here (processor speed, electrical flow, or if it is a web service, network latency, and so on). However, the amount ot time invested would be in the "worst case", the amount of time the "slowest" thread takes, plus the final summation of 5 results step. Also, in practice, a thread that is meant to do 20% of the whole job, unlikely will take much longer than a single sequential process that would do 100% of the job (of course, it also depends on the size of the sample to be processed... the advantage won't be the same on a summation of 10K values, than summation of just 10 values with the same 5 threads... it's non-practicall, not worth it).

  • Quick sort: We all know in general how quick sort works. However, there's a chance to improve it, if, say, we execute it in TWO threads: one that does the odd numbers and one that does the even ones. Then executes recursively and at some point it joins results of both threads and does a final quick sort in a fashion that will not require so many repetitions as numbers will be sufficiently ordered after the two threads did its initial job. That's a serios gain on performance with a quite big and unordered number of items. Chances are three threads can be used by doing some arrangement to the logic behind it, but its gain is really minimum and not worth to be programmed. However, two threads have a decent performance (time) gain.

So, usage of "join" in python (or it's equivalent in other "concurrency" languages) has an important significance; but depends a lot on the programming understanding what does s/he want to "paralellize" and how skilled s/he is in splitting the algorithm in the right steps to be parallellized vs. what steps need to be kept in the main process. It's more a problem of "logic" thinking than a programming "anti-pattern".

Syriac answered 26/9, 2022 at 21:25 Comment(0)
B
-3

"What's the use of using join()?" you say. Really, it's the same answer as "what's the use of closing files, since python and the OS will close my file for me when my program exits?".

It's simply a matter of good programming. You should join() your threads at the point in the code that the thread should not be running anymore, either because you positively have to ensure the thread is not running to interfere with your own code, or that you want to behave correctly in a larger system.

You might say "I don't want my code to delay giving an answer" just because of the additional time that the join() might require. This may be perfectly valid in some scenarios, but you now need to take into account that your code is "leaving cruft around for python and the OS to clean up". If you do this for performance reasons, I strongly encourage you to document that behavior. This is especially true if you're building a library/package that others are expected to utilize.

There's no reason to not join(), other than performance reasons, and I would argue that your code does not need to perform that well.

Biskra answered 19/12, 2013 at 23:5 Comment(5)
What you say about cleaning up threads is not true. Take a look at the source code of threading.Thread.join(). All that function does is wait on a lock, and then return. Nothing is actually cleaned up.Intemperance
@Collin - The thread itself may be holding resources, in that case interpreter and OS will indeed need to clean up "cruft".Mitchum
Again, look at the source code of threading.Thread.join(). There is nothing in there that triggers collection of resources.Intemperance
Its not necessarily (and as you say, not at all) the threading module that is holding resources, but the thread itself. Using join() means you're waiting for the thread to finish doing what it wanted to do, which could include allocating and releasing resources.Biskra
Whether you wait or not doesn't affect when the resources held by the thread are released. I'm not sure why you're tying this in with calling join().Intemperance

© 2022 - 2024 — McMap. All rights reserved.