Execute multiple notebooks in parallel in pyspark databricks

Asked 26/8, 2021 at 11:31 Answered 2/10, 2022 at 3:57

Solved amazon-web-services databricks azure-databricks aws-databricks databricks-community-edition

Question is simple:

master_dim.py calls dim_1.py and dim_2.py to execute in parallel. Is this possible in databricks pyspark?

Below image is explaning what am trying to do, it errors for some reason, am i missing something here?

Heather answered 26/8, 2021 at 11:31 Comment(0)

Just for others in case they are after how it worked:

from multiprocessing.pool import ThreadPool
pool = ThreadPool(5)
notebooks = ['dim_1', 'dim_2']
pool.map(lambda path: dbutils.notebook.run("/Test/Threading/"+path, timeout_seconds= 60, arguments={"input-data": path}),notebooks)

Heather answered 26/8, 2021 at 23:44 Comment(3)

you can just use path - in this case it's easier to move project into new folder, etc. if the path isn't absolute, then it's treated as relative to the current notebook – Gallo 27/8, 2021 at 6:16

The limitation with this approach is you can't share dependencies with the parallel jobs. I hope databricks can improve this so we can pass not only strings to the called notebook – Papal 11/5, 2023 at 8:21

I will create level 2 list and run after the level 1 list has completed. Gives control. – Heather 12/5, 2023 at 10:36

your problem is that you're passing only Test/ as first argument to the dbutils.notebook.run (the name of notebook to execute), but you don't have notebook with such name.

You need either modify list of paths from ['Threading/dim_1', 'Threading/dim_2'] to ['dim_1', 'dim_2'] and replace dbutils.notebook.run('Test/', ...) with dbutils.notebook.run(path, ...)

Or change dbutils.notebook.run('Test/', ...) to dbutils.notebook.run('/Test/' + path, ...)

Gallo answered 26/8, 2021 at 12:12 Comment(0)

Databricks now has workflows/multitask jobs. Your master_dim can call other jobs to execute in parallel after finishing/passing taskvalue parameters to dim_1, dim_2 etc.

Sabotage answered 2/10, 2022 at 3:57 Comment(0)

Recommended topics

Hot tags