Recently updated my Airflow and saw BigQueryExecuteQueryOperator
has been deprecated in favor of BigQueryInsertJobOperator
. The documentation seemed rather vague, linking to REST Resource: jobs (and Method: jobs.query). In particular, it's unclear to me whether there's anywhere I can specify write_disposition
, destination_dataset_table
, etc. I want to make sure I'm not making things overly complicated.
Where I currently do
# my.sql
SELECT * FROM `proj.ds.table_1`
---------------------------------
# my-dag.py
BigQueryExecuteQueryOperator(
task_id='copy-table-1',
sql = 'my.sql',
destination_dataset_table='proj:ds.table_2',
write_disposition='WRITE_EMPTY',
dag=dag
)
do I now need to use DDL statements like
# my.sql
CREATE TABLE IF NOT EXISTS
ds.table_2
AS (
SELECT * FROM `proj.ds.table_1`
)
---------------------------------
# my-dag.py
BigQueryInsertJobOperator(
task_id='copy-table-1',
configuration={
'query': my.sql
},
dag=dag
)
Not to mention passing parameters to the query in the format suggested by the QueryParameter docs instead of just params={ 'table': THE_TABLE }
...