I am using airflow to trigger jobs on databricks. I have many DAGs running databricks jobs and I whish to have to use only one cluster instead of many, since to my understanding this will reduce the costs these task will generate.
Using DatabricksSubmitRunOperator
there are two ways to run a job on databricks. Either using a running cluster calling it by id
'existing_cluster_id' : '1234-567890-word123',
or starting a new cluster
'new_cluster': {
'spark_version': '2.1.0-db3-scala2.11',
'num_workers': 2
},
Now I would like to try to avoid to start a new cluster for each task, however the cluster shuts down during downtime hence it will not be available trough it's id anymore and I will get an error, so the only option in my view is a new cluster.
1) Is there a way to have a cluster being callable by id even when it is down?
2) Do people simply keep the clusters alive?
3) Or am I completely wrong and starting clusters for each task won't generate more costs?
4) Is there something I missed completely?