will DBT support temp table creation like create table #temp1 as select * from tab1 or it works only CTE way

Asked 20/7, 2020 at 18:41 Answered 23/7, 2020 at 13:48

I found out a way to handle the temp tables in DBT, write all those in pre-hook and call the final temp table in the outside of the pre-hook, tested and is working fine, able to reduce the code running time from more than 20 mins to 1 min. But I see one problem that we can't see the lineage graph in the DBT documents. Is there any way to handle the temp tables other than pre-hook and with lineage in Docs?

Convolution answered 20/7, 2020 at 18:41 Comment(1)

Hi and welcome! Please review the resource at the end of this comment and expand on your question with a reproducible example as much as possible! Thanks! stackoverflow.com/help/how-to-ask – Holliholliday 22/7, 2020 at 18:3

You're right in thinking that dbt does not support temporary tables. That's because temporary tables only persist in a single session, and dbt opens one connection/session per thread. Therefore any temporary tables created on one thread would not be visible to a model running on a different thread.

It sounds like CTEs are a performance drag for you though — out of interest, which warehouse are you using?

You've identified two workarounds, and there's another one worth discussing:

Option 1: Materialize your model as CTEs using the ephemeral materialization (docs)

Pros:

The models show up in the lineage graph
You can re-use these transformations in multiple downstream models by ref-ing them
You can test and document these models

Cons:

At some point there is a performance degradation with too many stacked CTEs (especially on older versions of postgres, where CTEs are an optimization fence)
Compiled SQL can be harder to debug

Option 2: Use pre-hooks to create temp tables

I would generally recommend against this — you can't test or document your models, and they won't be in the lineage graph (as you've noted).

Option 3: Materialize these models as tables in a separate schema, and drop the schema at the end of a run

I think Michael's suggestion is a good one! I'd tweak it just a little bit:

Use the schema config to materialize a model in a separate schema

{{ config(
  materialized='table',
  schema='my_temporary_schema'
) }}

Then, at the end of a run, use an on-run-end hook (docs) to drop that schema — in your dbt_project.yml:

on-run-end: "drop schema my_temporary_schema cascade"

Pros:

All the benefits of Option 1
Sounds like it might be more performant than using CTEs

Cons:

Make sure you don't have any dependent views on top of that schema! They might get dropped when you run a drop cascade command! This introduces fragility into your project!

Loxodrome answered 23/7, 2020 at 13:48 Comment(0)

I think there are two ways you could get the same result while preserving lineage in the documentation:

Write each temp table as a separate model where the logic happens in the pre_hook (like you suggested) and the model is just a view with the logic select * from YOUR_TEMP_TABLE_NAME.
Instead of using temp tables, create each table as a regular model, and just drop them in the post_hook of the "final" model that uses them or in the on-run-end of your dbt_project.yml.

Foretopgallant answered 20/7, 2020 at 18:56 Comment(0)

Two solutions:

creates a source (SRC) model so that you can separate the source from the staging models.
Build the source using a staging model directly.

These should both show up on the lineage graphs.

Hartfield answered 22/7, 2020 at 10:17 Comment(0)

Recommended topics

Hot tags