Price aside, why ever choose Google Cloud Bigtable over Google Cloud Datastore?
Asked Answered
N

1

36

If I have a use case for both huge data storage and searchability, why would I ever choose Google Cloud Bigtable over Google Cloud Datastore?

I've seen a few questions on SO and other sides "comparing" Bigtable and Datastore, but it seems to boil down to the same non-specific answers.

Here's my current knowledge and my thoughts:

Datastore is more expensive.

In the context of this question, let's forget entirely about pricing.

Bigtable is good for huge datasets.

It seems like Datastore is, too? I'm not seeing what specifically makes Bigtable objectively superior here.

Bigtable is better than Datastore for analytics.

How? Why? It seems like I can do analytics in Datastore as well, no problem. Why is Bigtable seemingly the unanimous decision industry-wide for analytics? What value do GMail, eBay, etc. get from Bigtable that Datastore can't provide?

Bigtable is integrated with Hadoop, Spark, etc.

Is Datastore not as well, considering it's built on Bigtable?

From this question, this statement was made in an answer:

Bigtable and Datastore are extremely different. Yes, the datastore is build on top of Bigtable, but that does not make it anything like it. That is kind of like saying a car is build on top of [car] wheels, and so a car is not much different from wheels.

However, this seems analogy seems nonsensical, since the car (including the wheels) intrinsically provides more value than just the wheels of a car by themselves.

It seems at first glance that Bigtable is strictly worse than Datastore, only providing a single index and limiting quick searchability. What am I missing?

Nationalism answered 26/11, 2018 at 21:21 Comment(2)
I'd suggest considering Cloud Spanner and Firestore in the mix as well. I would suggest thinking about price/performance for 100K+ operations per second when making comparisons.Wilhelminawilhelmine
I also looked at Firestore, and definitely throw that in as well. I didn't necessarily consider Cloud Spanner because it's a relational database. At 100K ops/second, price definitely comes into play, but the use case here would probably never exceed 500-1000 ops/second, and the price between the two stores is pretty similar.Nationalism
C
63

Bigtable and Datastore are optimized for slightly different use-cases, and offer different tradeoffs. The main ones are:

Data model:

  • Bigtable is a wide-column database -- think HBase and Cassandra
  • Datastore is a document database -- think MongoDB
  • Note that both of these can be used for key-value use cases

Cost model:

  • Bigtable charges per provisioned nodes
  • Datastore is serverless and charges per operation

In general, Bigtable is a good choice if you need:

  • Fast point-reads and range scans (especially at scale). Bigtable will offer lower latency for key-value lookups, as well as fast scans of contiguous rows - a powerful tool since rows are stored in lexicographic order. If you have simple, predictable query patterns and design your schema well, reading from Bigtable can be incredibly efficient.
  • High throughput writes (again, especially at scale). This is possible in part because Bigtable is eventually consistent - in exchange you can see big wins in price/performance.

Example use-cases that are great for Bigtable include time series data (for IoT, monitoring, and more - think extremely write heavy workloads and massive amounts of data generated over x units of time), analytics (think fraud detection, personalization, recommendations), and ad-serving (every microsecond counts).

Datastore (or Firestore) is a good choice if you need:

  • Query flexibility: Datastore offers document support and secondary indexes.
  • Strong consistency and/or transactions: Bigtable has eventually consistent replication and does not support multi-row transactions.
  • Mobile SDKs: Datastore and Firestore are incredibly well-integrated with firebase ecosystem.

Example use-cases include mobile and web applications, game state, user profiles, and product catalogs.

To answer a few of your questions explicitly:

  • Why is Bigtable used for analytics? It's mostly about performance: analytics use-cases are more likely to have large datasets and require high write throughput. It's a lot easier to run into the limits of a database if you're storing clickstream data, as opposed to something like user account information. Fast scans are also important for analytics use-cases: Bigtable allows you to retrieve all of the information you need about a user or a device extremely quickly, which you can process in a batch job or use to create recommendations and analysis on the fly.
  • Is Bigtable strictly worse than Datastore? Datastore definitely provides more built-in functionality like secondary indexes and document support, and if you need those features, Datastore is a fantastic choice. But that functionality comes with tradeoffs. Bigtable provides perhaps lower-level, but incredibly performant APIs that allow users to make those tradeoffs for themselves: If a user values, say, write performance over secondary indexes, Bigtable is an excellent option. You can think of it as an extremely versatile and powerful infrastructural building block. I actually like the wheel/car analogy: sometimes you don't want the car -- if what you really need is a dirt bike, a set of solid wheels is much more useful :)
Conceptacle answered 27/11, 2018 at 19:41 Comment(2)
"Bigtable is eventually consistent" -- It is not true. Bigtable is strongly consistent unless cluster replication is used.Alfieri
Fair enough! Bigtable is strongly consistent within a zone and eventually consistent between zones— but most of the time users are comparing consistency models when replication is enabled!Conceptacle

© 2022 - 2024 — McMap. All rights reserved.