scheduled export from Google BigQuery to Google Cloud Storage
Asked Answered
O

2

14

I'm new to Google Cloud and would like to know best use cases on how to schedule queries and export them to Google Cloud Storage. I've seen documentations on how to manually export data but couldn't find anything specific on doing it in an automated way. Is there any best way on how to approach this ?

Thanks

Otilia answered 4/11, 2017 at 14:48 Comment(6)
cloud.google.com/solutions/…Frenzy
In this repository (still under construction) I just implemented that using AppEngine Standard, main files are main.py, worker.py, they are setup by the yaml files ("app", "queue", "cron" and "worker"). Everyday at 10am I have a new file being exported from BQ to GCS which feeds some ML algorithms running everyday as well.Webby
You can also use Apps Script. I wrote a blog - shinesolutions.com/2017/11/01/…Crenation
Guys, thanks a lot for the responses. I managed using a Python script with export_data_to_gcs which exports a table to GCS - now I need to find how to export a query to GCS using python. any thoughts ?Otilia
There are some other StackOverflow questions (or see the BigQuery documentation) about using a destination table with Python. Run the query, then export the resulting table to GCS.Frenzy
@GrahamPolley thanks for the entertaining blog post! The thing I noticed though, is that that post helps someone schedule a query to a destination table, not the exporting of a table that already exists into a GCS (Google Cloud Storage) bucket. Is that possible with google scripts?Prisage
H
10

It is possible to create scheduled export jobs with the scheduled queries feature and EXPORT DATA statement. For example, this script below backups data daily to GCS as Parquet files with SNAPPY compression. Each time the job is executed it takes all the data from the day before.

DECLARE backup_date DATE DEFAULT DATE_SUB(@run_date, INTERVAL 1 day);

EXPORT DATA
  OPTIONS ( uri = CONCAT('gs://my-bucket/', CAST(backup_date AS STRING), '/*.parquet'),
    format='PARQUET',
    compression='SNAPPY',
    overwrite=FALSE ) AS
SELECT
  *
FROM
  `my-project.my-dataset.my-table`
WHERE
  DATE(timestamp) = backup_date

From the BiqQuery UI you can then create a scheduled query and set the trigger frequency and trigger time.

enter image description here

Harmony answered 15/10, 2021 at 14:46 Comment(4)
Where is the run_date param supposed to come from? I tried this script but got an error Query parameter 'run_date' not found at [2:44]Ysabel
run_date is a special variable that is only accessible for a scheduled queries (cloud.google.com/bigquery/docs/scheduling-queries#query_string). It is the time the query is scheduled to run, not when it is actually executed. For the actual time you can use CURRENT_DATE() instead (like you did in the edit). This is useful while developing your query, but I would advise against using it in your scheduled query because you can not be sure when you query is actually running and you can not simple yourself the backfill feature.Harmony
Ah I see. Then it makes sense to use run_date. I was testing the script by doing a normal execution before I set it up as a scheduled query so the run_date param was unavailable. However, wouldn't run_date be the same as CURRENT_DATE() when the query executes? . Btw thanks for this solution. Its running perfectly and was infinitely simpler than the other proposed solution.Ysabel
If the intended and the actual day of execution is the same, then yes run_date and CURRENT_DATE() will be the same.Harmony
B
5
  1. Implement your table export function [1] using Node.js, python or Go. These languages are supported by Cloud Functions and BigQuery.
  2. Insert the above function in Cloud Function [2] service which is an event driven serverless compute platform.
  3. Trigger the above function using Cloud Scheduler [3] using a schedule interval of your choice. The schedule interval can be specified using a cron expression. The scheduler will trigger the function via a REST call on the function´s endpoint.
  4. Verify the success of the above operation by visiting your bucket and ensuring that the table(s) has been successfully exported.
Brookner answered 13/1, 2020 at 20:11 Comment(1)
This helped me tremendously, thank you so much Moe!Prisage

© 2022 - 2024 — McMap. All rights reserved.