Google Cloud Bigtable vs Google Cloud Datastore
J

9

157

What is the difference between Google Cloud Bigtable and Google Cloud Datastore / App Engine datastore, and what are the main practical advantages/disadvantages? AFAIK Cloud Datastore is build on top of Bigtable.

J answered 6/5, 2015 at 18:57 Comment(1)
Please don't close. there is currently no official documentation on these and google will likely comment here.Leghorn
A
116

Based on experience with Datastore and reading the Bigtable docs, the main differences are:

  • Bigtable was originally designed for HBase compatibility, but now has client libraries in multiple languages. Datastore was originally more geared towards Python/Java/Go web app developers (originally App Engine)
  • Bigtable is 'a bit more IaaS' than Datastore in that it's not 'just there' but requires a cluster to be configured.
  • Bigtable supports only one index - the 'row key' (the entity key in Datastore)
    • This means queries are on the Key, unlike Datastore's indexed properties
  • Bigtable supports atomicity only on a single row - there are no transactions
  • Mutations and deletions appear not to be atomic in Bigtable, whereas Datastore provides eventual and strong consistency, depending on the read/query method
  • The billing model is very different:
    • Datastore charges for read/write operations, storage and bandwidth
    • Bigtable charges for 'nodes', storage and bandwidth
Awn answered 6/5, 2015 at 20:5 Comment(0)
C
113

Bigtable is optimized for high volumes of data and analytics

  • Cloud Bigtable doesn’t replicate data across zones or regions (data within a single cluster is replicated and durable), which means Bigtable is faster and more efficient, and costs are much lower, though it is less durable and available in the default configuration
  • It uses the HBase API - there’s no risk of lock-in or new paradigms to learn
  • It is integrated with the open-source Big Data tools, meaning you can analyze the data stored in Bigtable in most analytics tools customers use (Hadoop, Spark, etc.)
  • Bigtable is indexed by a single Row Key
  • Bigtable is in a single zone

Cloud Bigtable is designed for larger companies and enterprises who often have larger data needs with complex backend workloads.

Datastore is optimized to serve high-value transactional data to applications

  • Cloud Datastore has extremely high availability with replication and data synchronization
  • Datastore, because of its versatility and high availability, is more expensive
  • Datastore is slower writing data due to synchronous replication
  • Datastore has much better functionality around transactions and queries (since secondary indexes exist)
Circumference answered 6/5, 2015 at 23:24 Comment(2)
Bigtable now replicates across zones to provide availability in the face of a zonal outage: cloudplatform.googleblog.com/2018/07/…Winterize
I thought transaction is not a strong selling point for datastore. From its [doc|cloud.google.com/datastore/docs/concepts/transactions] "A transaction is a set of Google Cloud Datastore operations on one or more entities in up to 25 entity groups. " Also, datastore is built on top of Bigtable, right?Velarize
E
20

Bigtable and Datastore are extremely different. Yes, the datastore is build on top of Bigtable, but that does not make it anything like it. That is kind of like saying a car is build on top of wheels, and so a car is not much different from wheels.

Bigtable and Datastore provide very different data models and very different semantics in how the data is changed.

The main difference is that the Datastore provides SQL-database-like ACID transactions on subsets of the data known as entity groups (though the query language GQL is much more restrictive than SQL). Bigtable is strictly NoSQL and comes with much weaker guarantees.

Evan answered 6/5, 2015 at 20:30 Comment(4)
You were doing well until the last paragraph. The datastore provides transactions, but they are nothing like SQL and definitely not ACID.Felspar
@DanielRoseman Actually, it very much does. Here is a quote from the paper on Megastore (on which Datastore is built): "Each Megastore entity group functions as a mini-database that provides serializable ACID semantics." "we partition the datastore and replicate each partition separately, providing full ACID semantics within partitions". (research.google.com/pubs/pub36971.html)Evan
I think its misleading to call it Sql. A subset at most. Has no efficient count/group, all queries must use indexes etcLeghorn
Query language and transaction isolation are different things, you seem to be mixing them up. I am making a claim about the latter (ACID transactions). In your comment you are assuming I am talking about the former. Perhaps some hyphens will clarify? I'll explicitly mentions the query language issue to remove any doubt.Evan
Q
19

I am going to try to summarize all the answers above plus what is given in Coursea Google Cloud Platform Big Data and Machine Learning Fundamentals

+---------------------+------------------------------------------------------------------+------------------------------------------+--+
|      Category       |                             BigTable                             |                Datastore                 |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+
| Technology          | Based on HBase(uses HBase API)                                   | Uses BigTable itself                     |  |
| ----------------    |                                                                  |                                          |  |
| Access Mataphor     | Key/Value (column-families) like Hbase                           | Persistent hashmap                       |  |
| ----------------    |                                                                  |                                          |  |
| Read                | Scan Rows                                                        | Filter Objects on property               |  |
| ----------------    |                                                                  |                                          |  |
| Write               | Put Row                                                          | Put Object                               |  |
| ----------------    |                                                                  |                                          |  |
| Update Granularity  | can't update row ( you should write a new row, can't update one) | can update attribute                     |  |
| ----------------    |                                                                  |                                          |  |
| Capacity            | Petabytes                                                        | Terbytes                                 |  |
| ----------------    |                                                                  |                                          |  |
| Index               | Index key only (you should properly design the key)              | You can index any property of the object |  |
| Usage and use cases | High throughput, scalable flatten data                           | Structured data for Google App Engine    |  |
+---------------------+------------------------------------------------------------------+------------------------------------------+--+

Check this image too: enter image description here

enter image description here

Quell answered 4/3, 2019 at 20:8 Comment(1)
Write granularity in BigTable is NOT row, but row+column cloud.google.com/bigtable/docs/…Teufert
C
10

If you read papers, BigTable is this and Datastore is MegaStore. Datastore is BigTable plus replication, transaction, and index. (and is much more expensive).

Cowpoke answered 12/5, 2017 at 21:13 Comment(2)
Is it really more expensive? the minimum for BigTable is 3 nodes, at 10GB HDD it's $1400/mo. Seems pretty high no?Depurative
@ben, in my past experience it was. Datastore is charged per-operation instead of per-hour. (If you don't use it that much then yes you don't pay Datastore much. But if you have high traffic and then I think bigtable is much cheaper.) I think Bigtable claims 10k ops per second? In reality I found it to be lower, like around 1-2k, but still 3 nodes is > 5k/s. If you maintain that throughput for a month and maps that to Datastore pricing, it's probably much higher than 1.4k.Cowpoke
E
7

This might be another set of key differences between Google Cloud Bigtable and Google Cloud Datastore along with other services. The contents shown in the image below can also help you in selecting the right service.

enter image description here

enter image description here

Eminent answered 5/4, 2020 at 10:36 Comment(0)
G
2

A relatively minor point to consider, as of November 2016, bigtable python client library is still in Alpha, which means the future change might not be backward compatible. Also, bigtable python library is not compatible with App Engine's standard environment. You have to use the flexible one.

Gown answered 16/11, 2016 at 7:27 Comment(1)
As of November 2016, the same is for JavaPolestar
S
2

enter image description here

Cloud Datastore is a highly-scalable NoSQL database for your applications.
Like Cloud Bigtable, there is no need for you to provision database instances.
Cloud Datastore uses a distributed architecture to automatically manage
scaling. Your queries scale with the size of your result set, not the size of your
data set.
Cloud Datastore runs in Google data centers, which use redundancy to
minimize impact from points of failure. Your application can still use Cloud
Datastore when the service receives a planned upgrade.

enter image description here

 Choose Bigtable if the data is:
Big
● Large quantities (>1 TB) of semi-structured or structured data
Fast
● Data is high throughput or rapidly changing
NoSQL
● Transactions, strong relational semantics not required
And especially if it is:
Time series
● Data is time-series or has natural semantic ordering
Big data
● You run asynchronous batch or real-time processing on the data
Machine learning
● You run machine learning algorithms on the data
Bigtable is designed to handle massive workloads at consistent low latency
and high throughput, so it's a great choice for both operational and analytical
applications, including IoT, user analytics, and financial data analysis.
Squib answered 1/4, 2020 at 8:14 Comment(0)
A
1

Datastore is more application ready and suitable for a wide range of services, especially for microservices.

The underlying technology of Datastore is Big Table, so you can imagine Big Table is more powerfuly.

Datastore come with 20K free operation per days, you can expect to host a server with reliable DB with ZERO cost.

You can also check out this Datastore ORM library, it comes with a lot of great feature https://www.npmjs.com/package/ts-datastore-orm

Aflame answered 29/2, 2020 at 2:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.