It seems dbt only works for a single database.
If my data is in a different database, will that still work? For example, if my datalake is using delta, but I want to run dbt using Redshift, would dbt still work for this case?
It seems dbt only works for a single database.
If my data is in a different database, will that still work? For example, if my datalake is using delta, but I want to run dbt using Redshift, would dbt still work for this case?
To use dbt, you need to already be able to select from
your raw data in your warehouse.
In general, dbt is not an ETL tool:
[dbt] doesn’t extract or load data, but it’s extremely good at transforming data that’s already loaded into your warehouse. This “transform after load” architecture is becoming known as ELT (extract, load, transform). dbt is the T in ELT. [reference]
So no, you cannot use dbt with Redshift and Deltalake at the same time. Instead, use a separate service to extract and load data into your Redshift cluster — dbt is agnostic about which tool you use to do this.
There is a nuance to this answer - you could use dbt to select from external files in S3 or GCS, so long as you've set up your data warehouse to be able to read those files. For Redshift, this means setting up Redshift Spectrum. (For Snowflake, this means setting up an external table and on BigQuery, you can also query cloud storage data)
So, if the data you read in Deltalake lives in S3, if you set up your Redshift cluster to be able to read it, you can use dbt to transform the data!
You can use Trino with dbt to connect to multiple databases in the same project.
The Github example project https://github.com/victorcouste/trino-dbt-demo contains a fully working setup, that you can replicate and adapt to your needs.
I would say that DBT doesn't have an execution engine, so you can not use it to move data from one source to another as it isn't processing data itself, it only sends the SQL commands to the database.
In any case, if you want to move data from S3 to Redshift, maybe you could use Redshift Spectrum where you can query S3 as external tables. There you'll be able to use DBT on S3 and Redshift data from the same system.
@willie Chen the short answer is yes you can. The more accurate answer that is not the intent of dbt. As a tool it is intended for the transform part of ETL. It serves as a transform that is already existing in a data warehouse. I agree that you should use Redshift Spectrum for ETL.
Luther
© 2022 - 2024 — McMap. All rights reserved.