Materialized View vs Table Using dbt
Asked Answered
D

2

9

I'm just onboarding dbt and having gone through the tutorial docs I'm wondering if there's a difference between materializing my transformations as views or tables? I'm using Snowflake as the data warehouse. There's some documentation here that shows the differences between a table and a materialized view but if I'm using dbt to update the tables regularly, do they more or less become the same thing?

Thanks!

Decrepit answered 22/10, 2020 at 20:5 Comment(1)
Are you referring to in dbt the materialized parameter? docs.getdbt.com/docs/building-a-dbt-project/building-models/… Or a materialized view in snowflake?Breadth
D
18

dbt doesn't support materialized views, as far as I'm aware, but as Felipe commented, there is an open issue to discuss it. If it were possible to use materialized views on Snowflake, you're right that they somewhat become the same thing. The materialized view would update even if you haven't run dbt. As Drew mentions in the ticket though, there are a lot of caveats that make using tables with dbt preferable in most use cases: "no window functions, no unions, limited aggregates, can't query views, etc etc etc".

That said, dbt does support views and tables.

Even when you're using dbt, there's still a difference between a view and a table. A table will always need to be refreshed by dbt in order to be updated. A view will always be as up-to-date as the underlying tables it is referencing.

For example, let's say you have a dbt model called fct_orders which references a table that is loaded by Fivetran/Stitch called shopify.order. If your model is materialized as a view, it will always return the most up-to-date data in the Shopify table. If it is materialized as a table, and new data has arrived in the Shopify table since you last run dbt, the model will be 'stale'.

That said, the benefit of materializing it as a table is that it will run more quickly, given it's not having to do the SQL 'transformation' each time.

The advice I have seen given most often is something like this:

  • If using a view isn't too slow for your end-users, use a view.
  • If a view gets too slow for your end-users, use a table.
  • If building a table with dbt gets too slow, use incremental models in dbt.
Duplicity answered 22/10, 2020 at 20:28 Comment(4)
dbt materialized views open request: github.com/fishtown-analytics/dbt/issues/1162Bitters
@Duplicity in the case that dbt does support materialized views, which I think they do now from Felipe's comment, is there any difference between materializing as a table or a view through dbt? I'd assume since the view is cached upon materializing then there isn't any material difference?Decrepit
note that dbt doesn't support materialized views (but people are requesting it on that gh issue)Bitters
@KyleCheung Just updated answer. In short, yes, they are somewhat similar, but the shortcomings of materialized views probably still make tables with dbt better in my opinion.Duplicity
J
4

If you use DBT there's little need for materialized views: a materialized view is in fact a table which is based on a query - same as "create table as select". If you have a DBT model you can materialize as a table and you'll get the same result. Now the difference between a table and a materialized view is the fact that the materialized view automatically updates, while the table does not. But if you're using DBT you can schedule a refresh of the table by scheduling DBT.

This will only give you updated data after your scheduled DBT will complete, which is not the same as a materialized view if the underlying table changes frequently, but most people refrain from using materialized views on top of tables that change frequently because the running cost can get out of control.

Materialized views in Snowflake can only query one table, while with DBT there are more options - e.g. join two tables and materialize as a table will give you something you can't do with a materialized view.

Finally, if you really want to deploy materialized views with DBT there are two ways:

  1. Use the pre-hook or the post-hook, which executes any piece of SQL after running the DBT model. That can work but the maintenance is not great.
  2. There is a way to create your own materialization - see https://docs.getdbt.com/docs/guides/creating-new-materializations - this is not an easy task, but that will give you want you want. There's also a GitHub page called dbt-hack which gives interesting techniques on non-standard materializations.
Jordain answered 20/10, 2021 at 6:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.