After googling around and finding this related question. I've found a way to make this work (although it's not the ideal solution, as we'll see). Here is a working example with three pieces:
- the sql template file with a bit of jinja templating,
- the DAG, and
- the
gcloud
command needed to upload the template to the right place.
(1) The sql template file
This is just a text file whose filename ends with the .sql
extension. Let's say this file is called my-templated-query.sql
and contains:
SELECT COUNT(1)
FROM mytable
WHERE _PARTITIONTIME = TIMESTAMP('{{ ds }}')
(2) Referencing the template in the DAG file
To reference this template, create an operator like the following:
count_task = BigQueryOperator(
task_id='count_rows',
sql='/my-templated-query.sql')
(3) Adding the template file to Google Cloud Composer
It turns out that by default, airflow looks for template files in the dags folder. To upload our templated file to the dags folder, we run
gcloud beta composer environments storage dags import --environment my-env-name --location us-central1 --source path/to/my-templated-query.sql
You'll have to replace the env name, location, and source path accordingly.
It doesn't really seem right to upload all these templates to the dag folder. A better Airflow practice is to put your templates in their own folder, and specify the template_searchpath
parameter to point to it when you create your DAG. However, I'm not sure how to do this with Google Cloud Composer.
Update: I've realized it's possible to put subfolders in the DAG folder, which is useful for organizing large numbers of SQL templates. Let's say I put a SQL template file in DAG_FOLDER/dataset1/table1.sql
In the BigQueryOperator, Ithen can then refer to this using sql=/dataset1/table1.sql
. If you've a subfolder with lots of files in it and lots of other subfolders in it, you can also use the dag import
I show above to upload the entire sub folder recursively--just point it to subfolder.