I found out a way to handle the temp tables in DBT, write all those in pre-hook and call the final temp table in the outside of the pre-hook, tested and is working fine, able to reduce the code running time from more than 20 mins to 1 min. But I see one problem that we can't see the lineage graph in the DBT documents. Is there any way to handle the temp tables other than pre-hook and with lineage in Docs?
You're right in thinking that dbt does not support temporary tables. That's because temporary tables only persist in a single session, and dbt opens one connection/session per thread. Therefore any temporary tables created on one thread would not be visible to a model running on a different thread.
It sounds like CTEs are a performance drag for you though — out of interest, which warehouse are you using?
You've identified two workarounds, and there's another one worth discussing:
Option 1: Materialize your model as CTEs using the ephemeral
materialization (docs)
Pros:
- The models show up in the lineage graph
- You can re-use these transformations in multiple downstream models by
ref
-ing them - You can test and document these models
Cons:
- At some point there is a performance degradation with too many stacked CTEs (especially on older versions of postgres, where CTEs are an optimization fence)
- Compiled SQL can be harder to debug
Option 2: Use pre-hooks to create temp tables
I would generally recommend against this — you can't test or document your models, and they won't be in the lineage graph (as you've noted).
Option 3: Materialize these models as tables in a separate schema, and drop the schema at the end of a run
I think Michael's suggestion is a good one! I'd tweak it just a little bit:
- Use the schema config to materialize a model in a separate schema
{{ config(
materialized='table',
schema='my_temporary_schema'
) }}
- Then, at the end of a run, use an
on-run-end
hook (docs) to drop that schema — in yourdbt_project.yml
:
on-run-end: "drop schema my_temporary_schema cascade"
Pros:
- All the benefits of Option 1
- Sounds like it might be more performant than using CTEs
Cons:
- Make sure you don't have any dependent views on top of that schema! They might get dropped when you run a
drop cascade
command! This introduces fragility into your project!
I think there are two ways you could get the same result while preserving lineage in the documentation:
- Write each temp table as a separate model where the logic happens in the pre_hook (like you suggested) and the model is just a view with the logic
select * from YOUR_TEMP_TABLE_NAME
. - Instead of using temp tables, create each table as a regular model, and just drop them in the post_hook of the "final" model that uses them or in the
on-run-end
of yourdbt_project.yml
.
Two solutions:
- creates a source (SRC) model so that you can separate the source from the staging models.
- Build the source using a staging model directly.
These should both show up on the lineage graphs.
© 2022 - 2024 — McMap. All rights reserved.