I'm trying to set up a simple DBT pipeline that uses a parquet tables stored on Azure Data Lake Storage and creates another tables that is also going to be stored in the same location.
Under my models/
(which is defined as my sources path) I have 2 files datalake.yml
and orders.sql
. datalake.yml
looks like this:
version:2
sources:
- name: datalake
tables:
- name: customers
external:
location: path/to/storage1 # I got this by from file properties in Azure
file_format: parquet
columns:
- name: id
data_type: int
description: "ID"
- name: ...
My orders.sql
table looks like this:
{{config(materialized='table', file_format='parquet', location_root='path/to/storage2')}}
select name, age from {{ source('datalake', 'customers') }}
I'm also using the dbt-external-tables
package. Also note that when I run dbt debug
everything is fine and I can connect to my database (which happens to be Databricks).
I tried running dbt run-operation stage_external_sources
which returns Error: staging external sources is not implemented for the default adapter
. When I run dbt run
, I get Error: UnresolvedRelation datalake.customers
.
Or perhaps I could make use of the hive metastore instead somehow? Any tips on how I could fix this would be highly appreciated!
path/to/storage
in source and destination are different locations. I still consider myself to be a beginner with respect to DBT, but I love this tool and I will see if I (or my team) can contribute here. – Amil