Using multiple columns in a Unique_Key for incremental loading in DBT
Asked Answered
T

2

11

For incremental models, the DBT documentation here says:

The unique_key should be supplied in your model definition as a string representing a simple column or a list of single quoted column names that can be used together, for example, ['col1', 'col2', …])

I've built an incremental model in DBT with this incremental definition

{{
  config(
    materialized='incremental',
    unique_key = ['Col1', 'Col2', 'Col3']
  )
}}

Which compiles into this merge statement in in Snowflake:

using DW_DEV.dbt_dgarrison_DATA_STAGING.MY_TABLE__dbt_tmp as DBT_INTERNAL_SOURCE
    on 
        DBT_INTERNAL_SOURCE.['Col1', 'Col2', 'Col3'] = DBT_INTERNAL_DEST.['Col1', 'Col2', 'Col3']
...

And this reasonably throws a SQL ERROR complaining about the brackets:

SQL compilation error: syntax error line 4 at position 32 unexpected '['. syntax error line 4 at position 45 unexpected ','. syntax error line 4 at position 98 unexpected '['. syntax error line 4 at position 111 unexpected ','.

I can't find any other good examples using multiple columns this way. (there are options involving concatenating columns, and I'm open to recommendations on the best approach to that, but I'm trying to figure out how to use the DBT recommended syntax)

Transmutation answered 20/5, 2022 at 20:17 Comment(2)
What version of dbt are you running?Zipnick
1.0 - I had missed the memo that 1.1 is outTransmutation
M
11

As part of dbt-core 1.1.0, we can now pass a list to the unique_key statement in incremental models. See the original issue here.

This means that you should be able to achieve your goal by updating dbt-core and your dbt-<adapter> version locally; or updating your dbt Cloud version accordingly, to 1.1.0, since given the error you get, it looks like unique_key is still looking for a single string instead of an array.

Marcheshvan answered 23/5, 2022 at 6:41 Comment(2)
updating dbt was enough to get the syntax to work. It still didn't quite work for my case since I need it to handle nulls, and the dbt generated code doesn't do that.Transmutation
I am on dbt version 1.3.0 and this is not working. I use bigquery with version 1.3.0. I understand the question was for snowflake, but seems to be the same issue + linked PR indicates it should be working for dbt+bigquery. I'll post a reaction under the PR.Jacobinism
W
1

If it's of any help, dbt mentioned this in their docs. Simply add a surrogate key and use it in the snapshot configuration to accomplish this.


{% snapshot transaction_items_snapshot %}

    {{
        config(
          unique_key="id",
          ...
        )
    }}

select
    transaction_id || '-' || line_item_id as id,
    *
from {{ source('erp', 'transactions') }}

{% endsnapshot %}

Ref: https://docs.getdbt.com/reference/resource-configs/unique_key#use-a-combination-of-two-columns-as-a-unique-key

Wearisome answered 18/9, 2023 at 7:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.