External Backups/Snapshots for Google Cloud Spanner
Asked Answered
W

4

13

Is it possible to snapshot a Google Cloud Spanner Database/table(s)? For compliance reasons we have to have daily snapshots of the current database that can be rolled back to in the event of a disaster: is this possible in Spanner? Is there intention to support it if not?

For those who might ask why we would need it as Spanner is replicated/redundant etc - it doesn't guard against human error (dropping a table by accident) or sabotage/espionage hence the question and requirement.

Thanks, M

Winterize answered 28/2, 2017 at 21:42 Comment(0)
J
12

Today, you can stream out a consistent snapshot by reading out all the data using your favorite tool (mapreduce, spark, dataflow) and reads at a specific timestamp (using Timestamp Bounds).

https://cloud.google.com/spanner/docs/timestamp-bounds

You have about an hour to do the export before the data gets garbage collected.

In the future, we will provide a Apache Beam/Dataflow connector to do this in a more scalable fashion. This will be our preferred method for doing import/export of data into Cloud Spanner.

Longer term, we will support backups and the ability to restore to a backup but that functionality is not currently available.

Jorry answered 28/2, 2017 at 22:41 Comment(8)
Also unclear in the documentation is how many replicas of the data is kept in the cluster and if google keeps backups internally in case of a disaster.Myongmyopia
@Myongmyopia Regional configurations keep 3 copies of data (cloud.google.com/spanner/docs/…) We do keep backups internally, so it's possible for us to help you recover data if you file a support ticket.Gunpowder
is there an example of using dataflow/spark to export data ?Vivianna
Coming soon. Watch this space - github.com/apache/beam/pull/2166Jorry
Here's an example of how you could download/upload a Cloud Spanner database to/from PostgreSQL using JDBC: github.com/olavloite/spanner-jdbc-converterColyer
Hi, any updates about the progress of the backups support?Cadell
The import/export feature is great! and we were really waiting for this. Is there an option to trigger the export via command-line or schedule so this can be used as automatic backup routine ?Harsho
Backup support has been released. cloud.google.com/spanner/docs/backupMyongmyopia
B
5

As of July 2018, Cloud Spanner now offers Import and Export functionality which allows you to export a database into Avro format. If you go to the specific Cloud Spanner database in question via the Google Cloud Console website, you will see Import and Export buttons toward the top. Simply click Export, populate the requested information such as a destination Google Cloud Storage bucket, and the database will be backed-up in Avro format to Google Cloud Storage. If you need to restore a database, use the corresponding Import functionality from the Cloud Spanner section of the Google Cloud Console website.

Note: The actual backup and restore (i.e., export and import) are done using Google Cloud Dataflow and you will be charged for the dataflow operation.

See the documentation for Import and Export functionality.

Beethoven answered 16/7, 2018 at 14:31 Comment(2)
Beware, if you try to export a table that contains 0 rows, the Cloud Spanner Export Dataflow job created to perform the backup will fail. I have filed a ticket with Google (Case#16454353) and they acknowledged this is an actual problem they are working on.Calicut
This feature is great! and we were really waiting for this. Is there an option to trigger the export via command-line so this can be used as automatic backup routine ?Harsho
M
2

Google Cloud Spanner now has two methods that can be used to do backups.

https://cloud.google.com/spanner/docs/backup

You can either use the built-in backups or do an export/import using a dataflow job.

Myongmyopia answered 21/4, 2020 at 2:46 Comment(0)
A
0

Per the thread below (also answered by eb80), I have been able to successfully script my backups from Spanner using their Import/Export DataFlow jobs by building the templates and running the gcloud command out of a cron job. The catch is that when you schedule the job you need to parse the output and get the job ID so that you can then check the status of it (again, I scripted that) and when you get JOB_STATE_DONE or JOB_STATE_FAILED, it's complete. If you need to run the Import, you'll need that job ID because the specific backup folder structure is:

gs://bucketName/yourFolder/instanceId-databaseId-jobId

How to batch load custom Avro data generated from another source?

Anchorage answered 25/10, 2018 at 15:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.