Multiprocessing : use tqdm to display a progress bar

Asked 29/1, 2017 at 10:58 Answered 18/11, 2023 at 1:37

Solved python multiprocessing progress-bar tqdm

239

To make my code more "pythonic" and faster, I use multiprocessing and a map function to send it a) the function and b) the range of iterations.

The implanted solution (i.e., calling tqdm directly on the range tqdm.tqdm(range(0, 30))) does not work with multiprocessing (as formulated in the code below).

The progress bar is displayed from 0 to 100% (when python reads the code?) but it does not indicate the actual progress of the map function.

How can one display a progress bar that indicates at which step the 'map' function is ?

from multiprocessing import Pool
import tqdm
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   p = Pool(2)
   r = p.map(_foo, tqdm.tqdm(range(0, 30)))
   p.close()
   p.join()

Any help or suggestions are welcome...

Islean answered 29/1, 2017 at 10:58 Comment(2)

Can you post the code snippet of the progress bar? – Hegelianism 29/1, 2017 at 12:14

For people in search for a solution with .starmap(): Here is a patch for Pool adding .istarmap(), which will also work with tqdm. – Kazantzakis 6/8, 2019 at 4:58

105

Solution found. Be careful! Due to multiprocessing, the estimation time (iteration per loop, total time, etc.) could be unstable, but the progress bar works perfectly.

Note: Context manager for Pool is only available in Python 3.3+.

from multiprocessing import Pool
import time
from tqdm import *

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
    with Pool(processes=2) as p:
        max_ = 30
        with tqdm(total=max_) as pbar:
            for _ in p.imap_unordered(_foo, range(0, max_)):
                pbar.update()

Islean answered 29/1, 2017 at 14:26 Comment(8)

pbar.close() not required, it will be closed automatically on termination of with – Tewfik 30/8, 2017 at 16:56

Is the second/inner tqdm call necessary here? – Citreous 7/12, 2017 at 18:42

what about the output of the _foo(my_number) that is returned as "r" in question? – Herophilus 20/12, 2017 at 22:31

Is there a similar solution for starmap() ? – Retinite 20/4, 2018 at 8:56

@Citreous - it seems to work without ;). Anyway - imap_unordered is key here, it gives best performance and best progress bar estimations. – Benton 4/3, 2019 at 11:22

Is enumerate really necessary? Why not for _ in p.imap_unordered(...): – Pontifex 5/5, 2020 at 8:24

How do I retrieve the results with this solution? – Atman 6/9, 2020 at 2:2

instead of return square in the function, add the result to a list – Islean 6/9, 2020 at 10:21

253

Use imap instead of map, which returns an iterator of the processed values.

from multiprocessing import Pool
import tqdm
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   with Pool(2) as p:
      r = list(tqdm.tqdm(p.imap(_foo, range(30)), total=30))

Burgundy answered 24/7, 2017 at 9:25 Comment(13)

I tried this solution. It worked, but for some reason, the call to list() is necessary, as well as passing the size of the list in the total= argument for tqdm(). Why is that? – Cambrian 8/8, 2017 at 14:4

An enclosing list() statement waits for the iterator to end. total= is also required since tqdm does not know how long the iteration will be, – Burgundy 14/8, 2017 at 3:27

Is there a similar solution for starmap() ? – Retinite 20/4, 2018 at 8:56

for i in tqdm.tqdm(...): pass may be a more straight-forward, that list(tqdm.tqdm) – Greasewood 2/8, 2018 at 10:54

This works but has anyone else had it continuously print the progress bar on a newline for each iteration? – Quimby 16/11, 2018 at 6:6

If you encounter locking issues while trying this solution, try removing the tqdm.write() statements from your code. – Adel 13/3, 2019 at 21:33

The behaviour is wired when specific chunk_size of p.imap. Can tqdm update every iteration instead of every chunk? – Marine 26/3, 2019 at 10:41

None of the answers worked for multithreading, though. Has anyone found a concise solution? – Teilo 13/4, 2019 at 9:13

The method works; however, each bar is updated on the same line (overlapping progress bars for different processes). Does anyone know how to solve this? – Hereunder 28/9, 2019 at 6:38

@Burgundy please can you elaborate more on why we need to call list(). thanks – Ridicule 30/1, 2020 at 10:26

The "patch" solution for starmap can be found here: https://mcmap.net/q/76010/-starmap-combined-with-tqdm – Mandler 15/3, 2020 at 9:27

I don't think this solution works properly. Stays in 0% for almost all the time, and suddenly goes to 100%. – Atman 6/9, 2020 at 2:0

@CarlosSouza That's because imap keep order, iterator won't skip unfinished items. You can use imap_unordered. – Singspiel 3/8, 2023 at 11:20

237

Sorry for being late but if all you need is a concurrent map, I added this functionality in tqdm>=4.42.0:

from tqdm.contrib.concurrent import process_map  # or thread_map
import time

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
   r = process_map(_foo, range(0, 30), max_workers=2)

References: https://tqdm.github.io/docs/contrib.concurrent/ and https://github.com/tqdm/tqdm/blob/master/examples/parallel_bars.py

It supports max_workers and chunksize and you can also easily switch from process_map to thread_map.

Marnie answered 25/1, 2020 at 0:29 Comment(18)

Cool (+1), but throws HBox(children=(FloatProgress(value=0.0, max=30.0), HTML(value=''))) in Jupyter – Fustigate 26/6, 2020 at 8:24

@Ébe-Isaac see github.com/tqdm/tqdm/issues/937 – Marnie 26/6, 2020 at 8:27

I see an issue with discussion to hack tqdm_notebook, however, can't workout a solution to solve for tqdm.contrib.concurrent. – Fustigate 26/6, 2020 at 9:32

how to close the process with process_map? something like 'p.close()' and 'p.join()' ? – Rasbora 5/5, 2021 at 12:53

@Rasbora process_map creates, runs , closes/joins and returns a list. – Marnie 5/5, 2021 at 15:23

This is great! So glad I found it. One question remains, when I use this in a jupyter notebook, it doesn't work very well. I know there is a tqdm.notebook, is there someway to merge the two? – Eduardo 17/5, 2021 at 19:2

I report the same issues when using this in a jupyter notebook. Especially it crashes for thread_map – Poinciana 6/6, 2021 at 1:0

This makes unconditional copies of the iterated arguments, while the others seems to do copy-on-write. – Volvox 28/6, 2021 at 9:38

Hmm.. finishes while the progress bar is stuck at zero. – Kozhikode 14/8, 2021 at 4:54

If i pass in some kwargs (for instance, initargs, initializer - which are kwrgs for multiprocessing.Pool), does the wrapper pass them onto the Pool instance created? I can see it does pass through max_workers, and chunksize. – Timothea 12/9, 2021 at 16:14

@Eduardo @Vladimir Vargas I don't have any issues if I do something like e.g. thread_map(fn, *iterables, tqdm_class=tqdm.notebook.tqdm, max_workers=12) in a Jupyter Notebook today. – Implacable 13/10, 2021 at 7:14

Using this with requests gives me the wrong number of finished iterations. Prefer @Islean solution – Saturation 26/7, 2022 at 23:23

When I try this, my progress bar is stuck at zero and never updates. – Epp 23/8, 2022 at 20:51

Hi, I'm using the thread_map, which, looks weird with unrecognized texts, like "?" in a box. Something like ! box[?]t.t7...Done! box[?]box[?]box[?].., the bar shows 0%. I'm using it in Windows default command line. – Landel 23/10, 2022 at 19:33

Works perfectly here! Thnks – Bobbibobbie 3/2, 2023 at 19:15

bizarre to include this functionality in tqdm. talk about scope creep – Departure 21/6, 2023 at 22:50

This produces "OSError: handle is closed" every time, when using it to read files in parallel on Mac in Python 3.11 run from Jupyter. Seems to work ok in Linux, but not Mac. Seems to be a bug in ProcessPoolExecutor. Haven't found a fix other than to just not use tqdm...process_map anymore. :-( – Lunisolar 11/12, 2023 at 19:27

For me it works 50% slower than just map with no tqdm. I might messed up something, but on my code with same run settings, I was getting 430 vs 250 seconds runtime on the same data. – Cedar 31/1, 2024 at 3:6

105

Solution found. Be careful! Due to multiprocessing, the estimation time (iteration per loop, total time, etc.) could be unstable, but the progress bar works perfectly.

Note: Context manager for Pool is only available in Python 3.3+.

from multiprocessing import Pool
import time
from tqdm import *

def _foo(my_number):
   square = my_number * my_number
   time.sleep(1)
   return square 

if __name__ == '__main__':
    with Pool(processes=2) as p:
        max_ = 30
        with tqdm(total=max_) as pbar:
            for _ in p.imap_unordered(_foo, range(0, max_)):
                pbar.update()