dbt macro to iterate over item in list within a sql call?
Asked Answered
S

1

5

First off, I am a dbt backer! I love this tool and the versatility of it.

When reading some of the docs I noticed that I might be able to do some meta work on my schemas every time I call a macro.

One of those would be to clean up schemas.

(This has been edited as per discussion within the dbt slack)

  1. dbt run-operation freeze that would introspect all of the tables that would be written with dbt run but with an autogenerated hash (might just be timestamp). It would output those tables in the schema of my choice and would log the “hash” to console.

  2. dbt run-operation unfreeze --args '{hash: my_hash}' that would then proceed to find the tables written with that hash prefix and clean them out of the schema.

Shankle answered 19/8, 2020 at 0:54 Comment(4)
Can you expand more on what you are trying to do with the run_query? You briefly send "clean up" schemas? You have some built in tools to iterate over schemas and tables already - example grant_select_on_schemas and then you can just put this on a post run-hook.Annular
yeah, I'm looking to do something very similar to grant_select_on_schemas but I don't want to have to pass in a schema list and instead have the schemas pulled at macro run time.Shankle
Ok - can you expand on that? Seems like we can build a macro to probably just perform whatever that action is for all tables in a schema or all tables in all schemas etc.Annular
@gscott, I edited my original post to a more concise use case - maybe this might better illustrate my actual objectiveShankle
U
16

I have created such a macro in an older version of dbt and it still works on 0.17.1.

The macro below item_in_list_query is getting a list of tables from a separate macro get_tables (also below). That list of tables is then concatenated inside item_in_list_query to compose a desired SQL query and execute it. For demonstration there is also a model in which item_in_list_query is used.

item_in_list_query

{% macro item_in_list_query() %}

    {% set tables = get_tables() %}

    {{ log("Tables: " ~ tables, True) }}

    {% set query %}
        select id
        from my_tables
        {% if tables -%}
            where lower(table_name) in {% for t in tables -%} {{ t }} {%- endfor -%}
        {%- endif -%}
    {% endset %}

    {{ log("query: " ~ query, True) }}

    {# run_query returns agate.Table (https://agate.readthedocs.io/en/1.6.1/api/table.html). #}
    {% set results = run_query(query) %}

    {{ log("results: " ~ results, True) }}

    {# execute is a Jinja variable that returns True when dbt is in "execute" mode i.e. True when running dbt run but False during dbt compile. #}
    {% if execute %}
    {# agate.table.rows is agate.MappedSequence in which data that can be accessed either by numeric index or by key. #}
    {% set results_list = results.rows %}
    {% else %}
    {% set results_list = [] %}
    {% endif %}

    {{ log("results_list: " ~ results_list, True) }}
    {{ return(results_list) }}

{% endmacro %}

get_tables

{% macro get_tables() %}
      {%- set tables = [
          ('table1', 'table2')
      ] -%}
  {{return(tables )}}
{% endmacro %}

model

{%- for item in item_in_list_query() -%}
  {%- if not loop.first %} UNION ALL {% endif %}
  select {{ item.id }}
{%- endfor -%}

Unhallowed answered 19/8, 2020 at 7:21 Comment(1)
This is gold - I will continue to look through this to see if I can construct your example for my use case.Shankle

© 2022 - 2024 — McMap. All rights reserved.