Bigtable Backups and Redundancy
Asked Answered
I

4

6

Google Cloud Bigtable looks fantastic, however I have some questions about backups and redundancy.

Are there any options for backing up data to protect against human errors?

Clusters currently run in a single zone - are there any ways to mitigate against a zone being unavailable?

Illuviation answered 7/5, 2015 at 9:21 Comment(1)
Does this answer your question? Google Cloud Bigtable backup and recoveryPharisaism
V
4

One way to backup your data that's available today is to run an export MapReduce as described here:

https://cloud.google.com/bigtable/docs/exporting-importing#export-bigtable

You are correct that as of today, Bigtable Cluster availability is tied to the availability of the Zone they run in. If stronger availability is a concern, you can look at various methods for replicating your writes (such as kafka) but be aware that this adds other complexity to the system you are building such as managing consistency between clusters. (What happens if there is a bug in your software, and you skip distribution of some writes?)

Using a different system such as Cloud Datastore avoids this problem, as it is not a single zonal system - but it provides other tradeoffs to consider.

Value answered 12/6, 2015 at 16:20 Comment(0)
L
2

As of writing this answer, Cloud Bigtable supports managed backups which lets you save a copy of a table's schema and data, then restore from the backup to a new table at a later time.

Lycurgus answered 17/9, 2020 at 18:52 Comment(0)
R
1

It seems that replication feature is not available at this stage so I'm seeing the following options given that read access to Write Ahead Log (or whatever the name of BigTable TX log is) is not provided:

  1. In Google We Trust. Rely on their expertise in ensuring availability and recovery. One of the attractions of hosted BigTable to HBase developers is lower administrative overhead, not having to worry about backups and recovery.

  2. Deploy a secondary BigTable cluster in a different AZ and send it a copy of each Mutation in async mode, with more aggressive write buffering on the client since low latency is not a priority. You can even deploy a regular HBase cluster instead of BigTable cluster but the extent to which Google's HBase client and Apache HBase client can co-exist in the same runtime remains to be seen.

  3. Copy Mutations to local file, offloaded on schedule to a GCP storage classes of choice: standard or DRA. Replay the files on recovery.

  4. A variation of 3). Stand-up a Kafka cluster, distributed across multiple availability zones. Implement a producer and send Mutations to Kafka, its throughput should be higher than BigTable/HBase anyway. Keep track of offset and replay Mutations by consuming messages from Kafka on recovery.

Another thought... if history is any lesson, AWS didn't have Multi-AZ option from the very start. It took them a while to evolve.

Rondo answered 8/5, 2015 at 8:45 Comment(0)
G
0

You may consider Egnyte's https://github.com/egnyte/bigtable-backup-and-restore. These are Python wrappers for the java-bigtable-hbase shaded jar which export/import Bigtable data to/from a GCS bucket as a series of Hadoop sequence files.

Grog answered 20/2, 2020 at 10:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.