It is possible to create scheduled export jobs with the scheduled queries feature and EXPORT DATA
statement. For example, this script below backups data daily to GCS as Parquet files with SNAPPY compression. Each time the job is executed it takes all the data from the day before.
DECLARE backup_date DATE DEFAULT DATE_SUB(@run_date, INTERVAL 1 day);
EXPORT DATA
OPTIONS ( uri = CONCAT('gs://my-bucket/', CAST(backup_date AS STRING), '/*.parquet'),
format='PARQUET',
compression='SNAPPY',
overwrite=FALSE ) AS
SELECT
*
FROM
`my-project.my-dataset.my-table`
WHERE
DATE(timestamp) = backup_date
From the BiqQuery UI you can then create a scheduled query and set the trigger frequency and trigger time.
main.py
,worker.py
, they are setup by the yaml files ("app", "queue", "cron" and "worker"). Everyday at 10am I have a new file being exported from BQ to GCS which feeds some ML algorithms running everyday as well. – Webby