Get column names AND types using star macro in dbt
Asked Answered
S

1

6

Using the star macro, is there a way to also get the column data type (boolean, numerical, etc), in addition to the column name?

For example, this query uses the star macro to collect the column names from a reference table, saves it as an array variable column_names, and then I loop over this array and apply the max function to all the columns.

{% set column_names = star(
    from=ref_table,
    except=["a", "b", "c"],
    as_list=True)
%}

select 
    date_trunc('week', day) as week,
    name,

    {%- for col in column_names %}  
    max({{ col|lower }}) as {{ col | lower }}{%- if not loop.last %},{{ '\n  ' }}{% endif %}
    {%- endfor %}

from {{ ref('my_table_name') }}    
group by 1, 2

I would like to conditionally apply the max function to only boolean columns.

This might look something like

{%- for col in column_names %}  
    {% if is_boolean(col) %}  
    max({{ col|lower }}) as {{ col | lower }}{%- if not loop.last %},{{ '\n  ' }}{% endif %}
    {% endif %}
{%- endfor %}

but the problem is that the star macro passes the column names as a string, so it's not carrying any metadata with it.

How might I get the column data type here?

Data warehouse: Snowflake

Stimson answered 23/6, 2022 at 13:17 Comment(0)
A
7

You can view the source for dbt_utils.star here

Under the hood, it uses dbt_utils.get_filtered_columns_in_relation. That macro also just returns column names. However! that macro uses the built-in adapter.get_columns_in_relation, which returns a list of Column objects, which have a data_type property.

So your code becomes:

{% set all_columns = adapter.get_columns_in_relation(
    ref("my_table")
) %}
{% set except_col_names=["a", "b", "c"] %}

select 
    date_trunc('week', day) as week,
    name,

    {%- for col in all_columns if col.name not in except_col_names %}  
    {% if col.data_type == 'BOOLEAN' %}  
    max({{ col.name|lower }}) as {{ col.name|lower }}{%- if not loop.last %},{{ '\n  ' }}{% endif %}
    {% endif %}
    {%- endfor %}

from {{ ref('my_table_name') }}    
group by 1, 2
Azar answered 23/6, 2022 at 21:16 Comment(1)
Thank you! I adjusted it a little bit and used col.name instead of col.column and col.data_type == 'BOOLEAN' instead of col.dtype == 'bool'Stimson

© 2022 - 2024 — McMap. All rights reserved.