Google's Bigtable vs. A Relational Database [duplicate]
Asked Answered
C

2

33

Duplicates

I don't know much about Google's Bigtable but am wondering what the difference between Google's Bigtable and relational databases like MySQL is. What are the limitations of both?

Cytoplasm answered 23/4, 2009 at 18:16 Comment(1)
Several dups on the "Related" sidebar there. https://mcmap.net/q/341225/-choosing-a-database-type-closedEverywhere
M
34

Bigtable is Google's invention to deal with the massive amounts of information that the company regularly deals in. A Bigtable dataset can grow to immense size (many petabytes) with storage distributed across a large number of servers. The systems using Bigtable include projects like Google's web index and Google Earth.

According to Google whitepaper on the subject:

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.

The internal mechanics of Bigtable versus, say, MySQL are so dissimilar as to make comparison difficult, and the intended goals don't overlap much either. But you can think of Bigtable a bit like a single-table database. Imagine, for example, the difficulties you would run into if you tried to implement Google's entire web search system with a MySQL database -- Bigtable was built around solving those problems.

Bigtable datasets can be queried from services like AppEngine using a language called GQL ("gee-kwal") which is a based on a subset of SQL. Conspicuously missing from GQL is any sort of JOIN command. Because of the distributed nature of a Bigtable database, performing a join between two tables would be terribly inefficient. Instead, the programmer has to implement such logic in his application, or design his application so as to not need it.

Mismanage answered 23/4, 2009 at 18:40 Comment(3)
"Imagine, for example, the difficulties you would run into if you tried to implement Google's entire web search system with a MySQL database" What would be difficulties be?Bagehot
@Amoeba table size, index size, redundancy, massively parallel simultaneous access, and many others. MySQL does well up to a point, but scaling up and sharding out to multiple machines becomes a coordination nightmare.Mismanage
I was discussing BigTable over dinner with a Google engineer the other night. A very key point, for me, was that non-relational databases are much better when you have multiple servers storing data simultaneously. Therefore, you don't end up with duplicate references and merging and querying data is more seamless. I can definitely see the benefits of non-relational databases, especially when it comes to massive scaleability.Kazan
E
16

Google's BigTable and other similar projects (ex: CouchDB, HBase) are database systems that are oriented so that data is mostly denormalized (ie, duplicated and grouped).

The main advantages are: - Join operations are less costly because of the denormalization - Replication/distribution of data is less costly because of data independence (ie, if you want to distribute data across two nodes, you probably won't have the problem of having an entity in one node and other related entity in another node because similar data is grouped)

This kind of systems are indicated for applications that need to achieve optimal scale (ie, you add more nodes to the system and performance increases proportionally). In an RDBMS like MySQL or Oracle, when you start adding more nodes if you join two tables that are not in the same node, the join cost is higher. This becomes important when you are dealing with high volumes.

RDBMS' are nice because of the richness of the storage model (tables, joins, fks). Distributed databases are nice because of the ease of scale.

Elana answered 23/4, 2009 at 18:38 Comment(2)
But if the data is not normalized, updates would be more difficult as you might need to reflect the same information at multiple places, and even worse if they are different nodes. How do denormalized databases deal with that?Bagehot
Erm... Aren't you confusing the term ORM with RDBMS?Aesthesia

© 2022 - 2024 — McMap. All rights reserved.