Spark pool taking time to start in azure synapse Analytics
Asked Answered
A

3

7

I have created 3 different notebook using pyspark code in Azure synapse Analytics. Notebook is running using spark pool. There is only one spark pool for all 3 notebook. when these 3 notebook run individually, spark pool starts for all 3 notebook by default.

The issue which i am facing is related to spark pool. It is taking 10 minutes to start in each notebook. The Vcores assigned is 4 and executor is 1. Can somebody please help me to know how can we boost the start of spark pool in azure synapse Analytics.

Abhor answered 25/11, 2020 at 3:27 Comment(6)
If my answer is useful for you, could you please accept it as an answer? It may help more people who have similar issue.Godfree
Did you visit Spark pausing setting and set the number of idle minutes to whatever time you want? It is not clear why spark pool start every time for each notebook.Separable
have you gotten a fix for this? i'm also having the same issue.Mccrary
Yes, you donot have to split the cells unless it is not required to change language for codingAbhor
@kshitizsinha so in your notebooks, you only have one cell? how much time is reduced after you did that?Mccrary
Before including code into single cell it was taking 10-12 minutes. After merging spark starts in 2-3 minutes approximatelyAbhor
T
1

I have this problem a lot too. It takes 4-5 minutes in my experience as well.

If it takes longer, make sure you publish (save) your notebook first, then reload the page. Sometimes that refreshes the underlying Livy session.

Tournament answered 16/10, 2022 at 0:16 Comment(0)
V
0

If you turn on "dynamically allocate executors" for the spark pool, the startup time seems to drop to around 90 seconds (from 3-5 minutes). However, I haven't found a method to shrink that or keep the spark pool alive.

Voltmeter answered 13/10, 2023 at 17:9 Comment(0)
G
-5

The performance of your Apache Spark pool jobs depends on multiple factors. These performance factors include:

  • How your data is stored
  • How the cluster has configured (Small, Medium, Large)
  • The operations that are used when processing the data.

Common challenges you might face include:

  • Memory constraints due to improperly sized executors.
  • Long-running operations
  • Tasks that result in cartesian operations.

There are also many optimizations that can help you overcome these challenges, such as caching and allowing for data skew.

The following article Optimize Apache Spark jobs (preview) in Azure Synapse Analytics describes common Spark job optimizations and recommendations.

Godfree answered 27/11, 2020 at 6:47 Comment(1)
Unfortunately this answer does not even take the question into an account. You're describing performance of the cluster not its initialisation time which I have personally found abysmally slow... (Tasks that take 5s to perform have to wait over 3 minutes to actually spin up the spark itself...????)Stanch

© 2022 - 2024 — McMap. All rights reserved.