Execution of SQL Statements inside Stored Procedure in Parallel in SnowFlake DB
Asked Answered
P

4

5

In SnowFlake, would there be any option to execute the sql statements in parallel inside a Stored Procedure. I've a Stored Procedure (shown below), which has 35 sql statements, which run sequentially. We are planning to reduce the time , thinking of executing all these in parallel.

What would be the best approach to achieve this? (All I could think of is create 35 Stored Procedures and call all of them from a scheduler same time). Wanted to check if there would be any better approach of any SnowFlake feature to achieve this?

create or replace procedure SP_TEST()
  returns string
  language javascript
  execute as CALLER
  as
  $$
    try {
      var step_num = 0
    step_num = 0
    step_num++ //--#1
       var sql_statement1   = snowflake.createStatement( {sqlText: `INSERT INTO TGT_TBL select * from TBL_A`} )
       var sql_statement1_execute = sql_statement1.execute() 
        step_num++ //--#1
      var sql_statement2   = snowflake.createStatement( {sqlText: `INSERT INTO TGT_TBL select * from TBL_B`} )
      var sql_statement2_execute = sql_statement2.execute() 
      return  "Completed "+step_num+" steps to load into TGT_TBL"
     }
   catch (err) {
        throw(err)
    }
  $$
;
Paprika answered 7/10, 2020 at 6:28 Comment(0)
G
4

Actually it's sequential because you are running the execute()-method for every statement.

To answer your question directly: Achieving parallelization within the single procedure is not possible and there is no feature or specific method as far as I know.

But in general: Increasing your warehouse-size is a good way of saving some time as the queries may run faster. It's not parallel but faster.

Scaling-out (i.e. using multi cluster warehouse) doesn't provide benefits in my opinion when you use one single procedure.

Greggrega answered 7/10, 2020 at 7:11 Comment(0)
C
4

As an alternative to your idea about creating 35 tasks, you can consider creating one task to call a stored procedure which will create 35 tasks to call these individual queries. If the query SQLs can be read from a table, or some part of them can be generated by JavaScript (such as table names with dates, ie CONNECT20200120 etc) then it could be a good automation.

Cosmos answered 7/10, 2020 at 10:12 Comment(2)
Could you please explain in detail. I understand I can keep the queries in table and pick it up from there. But, how can I make those 35 statements run non sequential?Paprika
You need to create a new task for each query for parallel runs. I see only one problem here, that snowflake does not support one-time tasks. So there can be a main task and child tasks defined using "after" clause, and a cleaner task to drop all tasks after the first run.Cosmos
B
2

split your SP into several ones (which should run in parallel) and use SF dependent tasks for diff parts runs after main part is done.

Burgoo answered 25/6, 2021 at 12:57 Comment(0)
R
1

If your goal is to populate table TGT_TBL with values from multiples tables [TBL_A, TBL_B,..., TBL_35]. I would recommend you to use a third-party tool like dbt, Informatica Cloud (IICS) or any tool you prefer from this list https://docs.snowflake.com/en/user-guide/ecosystem-etl.html.

In fact, Snowflake stored procedure or even tasks are limited and also difficult to monitor for full orchestration/integration pipelines.

Using IICS with Pushdown Optimization or dbt, you can use incremental load approach and execute all your 35 sql statements simultaneously on Snowflake with still the power of Snowflake virtual warehouses.

Ruskin answered 20/2, 2022 at 10:55 Comment(1)
An even simpler way would be to union the various select queries into one single insert, like: INSERT INTO table SELECT ... UNION ALL SELECT ... UNION ALL SELECT ...;. But that only works if all the inserts go into the same table!Periapt

© 2022 - 2024 — McMap. All rights reserved.