Using dbt and Meltano, how can I prevent multiple dbt job runs from conflicting with each other?
Asked Answered
I

1

7

When running dbt jobs in Meltano, dbt run jobs may collide with each other if run out of a triggered context - for instance, when an on-demand job collides with a scheduled job or a CI-based job.

If dbt run operates on the same tables at the same time, this generally causes a crash and sometimes a data quality issue if the same insert is performed twice on a single target table.

Any way to prevent run collisions, using either Meltano functionality or native dbt functionality?

Illuminism answered 16/12, 2021 at 17:20 Comment(0)
D
8

One way is to generate a lock in your target database. Here's an example for MSSQL.

I chose a on-run-start: hook. This hook attempts to grab a lock for dbt that lasts for the duration of the DB session.

dbt_project.yml

on-run-start: 
  - "{{ application_lock() }}" #Be sure only one dbt project runs at a time

Macro: application_lock.sql

{% macro application_lock() %}
    
DECLARE 
    @result int,
    @msg varchar(max);

EXEC @result = sp_getapplock @resource = 'dbt', @lockMode = 'Exclusive', @locktimeout = '1200000', @lockowner = 'Session';
IF @result not in (0, 1)
    BEGIN
    SET @msg = 'Result from sp_getapplock:'+ CAST(@result as varchar(max))+'. Lock could not be taken after waiting 20 minutes. Another DBT process probably has the applock, try again later.';
    THROW 51000, @msg, 1
    END

{% endmacro %}
Damarisdamarra answered 16/12, 2021 at 18:17 Comment(2)
Nice solution! Thank you!Illuminism
Note that if you're running with multiple threads with MSSQL, and have multiple models running the connection gets dropped and recreated multiple times. (It's possible the session ends in other areas too outside of threading). For my use case this application lock solves the vast majority of my issues but I wanted to call this out.Damarisdamarra

© 2022 - 2024 — McMap. All rights reserved.