I would like to be able to run an ad-hoc python script that would access and run analytics on the model calculated by a dbt run, are there any best practices around this?
We recently built a tool that could that caters very much to this scenario. It leverages the ease of referencing tables from dbt in Python-land. It's called dbt-fal.
The idea is that you would define the python scripts you would like to run after your dbt models are run:
# schema.yml
models:
- name: iris
meta:
owner: "@matteo"
fal:
scripts:
- "notify.py"
And then the file notify.py is called if the iris
model was run in the last dbt run
:
# notify.py
import os
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
CHANNEL_ID = os.getenv("SLACK_BOT_CHANNEL")
SLACK_TOKEN = os.getenv("SLACK_BOT_TOKEN")
client = WebClient(token=SLACK_TOKEN)
message_text = f"""Model: {context.current_model.name}
Status: {context.current_model.status}
Owner: {context.current_model.meta['owner']}"""
try:
response = client.chat_postMessage(
channel=CHANNEL_ID,
text=message_text
)
except SlackApiError as e:
assert e.response["error"]
Each script is ran with a reference to the current model for which it is running in a context
variable.
To start using fal, just pip install fal
and start writing your python scripts.
For production, I'd recommend an orchestration layer such as apache airflow.
See this blog post to get started, but essentially you'll have an orchestration DAG (note - not a dbt DAG) that does something like:
dbt run <with args>
-> your python code
Fair warning, though, this can add a bit of complexity to your project.
I suppose you could get a similar effect with a CI/CD tool like github actions or circleCI
© 2022 - 2024 — McMap. All rights reserved.