I have a presto cluster configured with 12 workers that is being queried by Java applications. The cluster is capable of performing 30 concurrent requests (if there are more, they are queued).
The applications might send around 80-100 distinct queries, which I expect to be handled by cluster.
Problem: When queries are performed sequentially they complete significantly faster than when they are performed in parallel.
For instance, if I run 100 queries sequentially each of them takes 1-12 seconds to complete and they all are completed in around 2 minutes. But if I start all of them in parallel it takes around 8-12 minutes to complete them all. At corner cases it takes up to 30 minutes.
If I look on the presto console I see that most of the queries are blocked and only 1-3 are in fact in Running state.
Unfortunately I can't post any of the queries. They usually access different schemas (up to 6 in one query), they are full of joins and nested queries. At the same time most of them are written following presto best practices.
Question: How can I improve performance? At least what areas should I investigate to find out the root cause?
Here are some metrics for one of the slowest queries (may be the numbers will say something to you).
Resource Utilization Summary
CPU Time 8.42m
Scheduled Time 26.04m
Blocked Time 4.77d
Input Rows 298M
Input Data 9.94GB
Raw Input Rows 323M
Raw Input Data 4.34GB
Peak Memory 10.18GB
Memory Pool reserved
Cumulative Memory 181G seconds
Timeline
Parallelism 477
Scheduled Time/s 1.47K
Input Rows/s 281K
Input Bytes/s 9.60MB
Memory Utilization 0B