DBT: conditionally set schema config
Asked Answered
K

2

6

I'm trying to determine how I can conditionally set schema config attributes. I've attempted this by a macro in both dbt_project.yml and also in schema.yml but both of these methods fail with:

00:23:19  Encountered an error:
Compilation Error
  Could not render {{get_location_root('lndv')}}: 'get_location_root' is undefined

The outcome I would like to achieve is conditionally setting location_root for Spark for various schemas. I want different locations for each environment. I thought the macro path was the best fit as this follows a pattern but it obviously doesn't work in dbt_project.yml or property files. I was using target.name to determine environment. It's in the same directory as other macros that are successfully rendering in models so the path is set correctly. I don't really want to resort to placing this config in each model if I can avoid it.

Does anyone have any thoughts on how I can solve this? Either getting the macro to work in dbt_project.yml / schema.yml or by some other method?

Regards,

Ashley

Kismet answered 1/9, 2022 at 0:40 Comment(0)
M
6

dbt only allows a small subset of jinja in .yml files. In particular, you can't use macros. But you can use simple conditionals. Jinja that appears in .yml files must be quoted:

schema: "{{ 'prod_schema' if target.name == 'production' else 'dev_schema' }}"

Another option for you is to override the built-in macro that generates schema names. There is a great write-up in the dbt docs on this topic.

From the docs:

If your dbt project includes a macro that is also named generate_schema_name, dbt will always use the macro in your dbt project instead of the default macro.

Therefore, to change the way dbt generates a schema name, you should add a macro named generate_schema_name to your project, where you can then define your own logic.

There is even an alternative "non-default" version of this macro that ships with dbt, called generate_schema_name_for_env, with the logic:

In prod:

  • If a custom schema is provided, a model's schema name should match the custom schema, rather than being concatenated to the target schema.
  • If no custom schema is provided, a model's schema name should match the target schema.

In other environments (e.g. dev or qa):

  • Build all models in the target schema, as in, ignore custom schema configurations.

To use generate_schema_name_for_env, you create a new macro in your project with the following contents:

-- put this in macros/generate_schema_name.sql

{% macro generate_schema_name(custom_schema_name, node) -%}
    {{ generate_schema_name_for_env(custom_schema_name, node) }}
{%- endmacro %}

EDIT: In Spark, you can use a similar trick to set the "location" of the materialized model by overriding the location_clause macro (which is part of the dbt-spark adapter). Your macro should template to a string with the word "location" followed by a path wrapped in single quotes:

{% macro location_clause() %}
  {%- set location_root = config.get('location_root', validator=validation.any[basestring]) -%}
  {%- set identifier = model['alias'] -%}
  {%- if location_root is not none and target.name == "production" %}
    location '{{ location_root }}/prod/{{ identifier }}'
  {%- elif location_root is not none %}
    location '{{ location_root }}/dev/{{ identifier }}'
  {%- endif %}
{%- endmacro -%}
Marietta answered 3/9, 2022 at 1:5 Comment(7)
Thanks tconbeer. The first suggestion might do the trick but I don't think the generate_schema_name macro is going to work in this case. I'm not looking to set the actual schema attribute but rather the Spark specific location_root.Kismet
Sorry, I misread your question, didn’t realize it was about setting location_root, not schema. You should be able to define a custom macro called location_clause instead: github.com/dbt-labs/dbt-spark/blob/…Marietta
I haven't tried it yet tconbeer, but that looks perfect! I'll give it a twirl shortly and report back.Kismet
Yep, that worked a treat! Thanks tconbeer.Kismet
Here's the macro I used for others that may be interersted:Kismet
{% macro spark__location_clause() %} {%- set location_root = config.get('location_root', validator=validation.any[basestring]) -%} {%- set identifier = model['alias'] -%} {%- if location_root is not none -%} {%- if target.name == 'prod' %} {{ log("Production", info=true) }} location '/mnt/prod/private/raw/{{ location_root }}/{{ identifier }}' {%- else %} {{ log("Non-production", info=true) }} location '/mnt/dev/private/raw/{{target.schema}}_{{ location_root }}/{{ identifier }}' {%- endif %} {%- endif %} {%- endmacro -%}Kismet
I didn't use a path in the dbt_profile.yml for location_root. Just the raw schema name.Kismet
S
0

Yes I found the syntax in source yml config is a little weird for if/elif/else:

project: '{{var("output_project_dev") if target.name == "dev" else var("output_project_prod") if target.name == "prod" else var("third_project") }}'

Solnit answered 25/3 at 5:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.