I'm trying to select the latest partition from a BigQuery table without scanning the whole table in a DBT model in order to save query costs.
DBT doesnt allow using semicolons in a data model so using the DECLARE
+SET
scripting statements doesn't work as suggested here.
DBT has a sql_header macro which allows setting some variables in the header but that header doesn't accept references to a data model or at least the following code is not compiling:
{{ config(
sql_header=" DECLARE latest_partition_date DATE;
DECLARE latest_load_timestamp TIMESTAMP;
SET latest_partition_date = (SELECT MAX(_PARTITIONDATE) FROM {{ ref("model") }} );
SET latest_load_timestamp = (SELECT MAX(loaded_at) FROM {{ ref("model") }} WHERE _PARTITIONDATE = latest_partition_date);"
) }}
-- set the main query
SELECT * FROM {{ ref("model") }}
WHERE
-- Select the latest partition to reduce 'Bytes processed' for loading the query.
_PARTITIONDATE = latest_partition_date
-- Select the latest load within the latest partition to get only one duplicate of data.
AND loaded_at = latest_load_timestamp
I need to solve this in standard SQL.
Other methods that were suggested included setting WHERE _PARTITIONDATE = CURRENT_DATE()
or using DATE_SUB(CURRENT_DATE(), 3)
but those don't satisfy because data load breakages are unpredictable and only dynamically selecting the latest would work here. Is that possible?
No matching signature for operator = for argument types: DATE, INT64. Supported signature: ANY = ANY
. It's probably because_PARTITIONDATE
is typeDATE
but it seems to be evaluated as and INT. Any idea how to fix that? – Dulcinea