How to see progress of Dask compute task?
Asked Answered
R

3

36

I would like to see a progress bar on Jupyter notebook while I'm running a compute task using Dask, I'm counting all values of id column from a large csv file +4GB, so any ideas?

import dask.dataframe as dd

df = dd.read_csv('data/train.csv')
df.id.count().compute()
Rhino answered 28/2, 2018 at 22:33 Comment(2)
Have you checked: github.com/tqdm/tqdm?Silkaline
Also #44484450Mercantile
G
48

If you're using the single machine scheduler then do this:

from dask.diagnostics import ProgressBar
ProgressBar().register()

http://dask.pydata.org/en/latest/diagnostics-local.html

If you're using the distributed scheduler then do this:

from dask.distributed import progress

result = df.id.count.persist()
progress(result)

Or just use the dashboard

http://dask.pydata.org/en/latest/diagnostics-distributed.html

Gerontocracy answered 28/2, 2018 at 22:38 Comment(7)
Is there any chance to see total time to complete a task on the Dashboard?Rhino
An individual function/task? No. Tasks contain arbirary Python code and so behave in unpredictable ways.Gerontocracy
when running the .register where would one see the progress bar?After
When I use dask with the progress bar, it just freezes on zero while generating enough heat and CPU usage that I presume it's doing something. How does the progress bar get updated?Flogging
This is great for running Dask on Kaggle, which does not seem to support the dashboard (see kaggle.com/questions-and-answers/54405)Leporine
Thank you, this is so very helpful!Foothold
If tqdm is preferred, in the distributed case, you can also import tqdm and from dask.distributed import as_completed, then do futures = client.submit(func, iter) and for _ in tqdm(as_completed(futures), total=len(iter)): passPinnatifid
G
1

Bellow will show remaining time and items

from tqdm.dask import TqdmCallback

with TqdmCallback(desc="compute"):
    ...
    arr.compute()

# or use callback globally
cb = TqdmCallback(desc="global")
cb.register()
arr.compute()

https://github.com/tqdm/tqdm#dask-integration

https://github.com/tqdm/tqdm#dask-integration:~:text=from%20tqdm.dask%20import%20tqdmcallback%20with%20tqdmcallback(desc%3D%22compute%22)%3A%20...%20arr.compute()%20%23%20or%20use%20callback%20globally%20cb%20%3D%20tqdmcallback(desc%3D%22global%22)%20cb.register()%20arr.compute()

Goddard answered 3/6 at 17:8 Comment(1)
Excellent work! Thanks!Rhino
B
0

This resource provides full-code examples for both cases (local and distributed) and more detailed information about using the Dask Dashboard.

Note that when working in Jupyter notebooks you may have to separate the ProgressBar().register() call and the computation call you want to track (e.g. df.set_index('id').persist()) into two separate cells for the progress bar to actually appear.

DO:

enter image description here

DON'T DO:

enter image description here

Braca answered 5/10, 2021 at 8:55 Comment(1)
Link is dead....Mercantile

© 2022 - 2024 — McMap. All rights reserved.