will DBT support temp table creation like create table #temp1 as select * from tab1 or it works only CTE way
Asked Answered
dbt
C

3

8

I found out a way to handle the temp tables in DBT, write all those in pre-hook and call the final temp table in the outside of the pre-hook, tested and is working fine, able to reduce the code running time from more than 20 mins to 1 min. But I see one problem that we can't see the lineage graph in the DBT documents. Is there any way to handle the temp tables other than pre-hook and with lineage in Docs?

Convolution answered 20/7, 2020 at 18:41 Comment(1)
Hi and welcome! Please review the resource at the end of this comment and expand on your question with a reproducible example as much as possible! Thanks! stackoverflow.com/help/how-to-askHolliholliday
L
14

You're right in thinking that dbt does not support temporary tables. That's because temporary tables only persist in a single session, and dbt opens one connection/session per thread. Therefore any temporary tables created on one thread would not be visible to a model running on a different thread.

It sounds like CTEs are a performance drag for you though — out of interest, which warehouse are you using?

You've identified two workarounds, and there's another one worth discussing:

Option 1: Materialize your model as CTEs using the ephemeral materialization (docs)

Pros:

  • The models show up in the lineage graph
  • You can re-use these transformations in multiple downstream models by ref-ing them
  • You can test and document these models

Cons:

  • At some point there is a performance degradation with too many stacked CTEs (especially on older versions of postgres, where CTEs are an optimization fence)
  • Compiled SQL can be harder to debug

Option 2: Use pre-hooks to create temp tables

I would generally recommend against this — you can't test or document your models, and they won't be in the lineage graph (as you've noted).

Option 3: Materialize these models as tables in a separate schema, and drop the schema at the end of a run

I think Michael's suggestion is a good one! I'd tweak it just a little bit:

  1. Use the schema config to materialize a model in a separate schema
{{ config(
  materialized='table',
  schema='my_temporary_schema'
) }}
  1. Then, at the end of a run, use an on-run-end hook (docs) to drop that schema — in your dbt_project.yml:
on-run-end: "drop schema my_temporary_schema cascade"

Pros:

  • All the benefits of Option 1
  • Sounds like it might be more performant than using CTEs

Cons:

  • Make sure you don't have any dependent views on top of that schema! They might get dropped when you run a drop cascade command! This introduces fragility into your project!
Loxodrome answered 23/7, 2020 at 13:48 Comment(0)
F
0

I think there are two ways you could get the same result while preserving lineage in the documentation:

  1. Write each temp table as a separate model where the logic happens in the pre_hook (like you suggested) and the model is just a view with the logic select * from YOUR_TEMP_TABLE_NAME.
  2. Instead of using temp tables, create each table as a regular model, and just drop them in the post_hook of the "final" model that uses them or in the on-run-end of your dbt_project.yml.
Foretopgallant answered 20/7, 2020 at 18:56 Comment(0)
H
0

Two solutions:

  1. creates a source (SRC) model so that you can separate the source from the staging models.
  2. Build the source using a staging model directly.

These should both show up on the lineage graphs.

Hartfield answered 22/7, 2020 at 10:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.