Can relational database scale horizontally
Asked Answered
T

6

66

After some googling I have found:

Note from mysql docs:

MySQL Cluster automatically shards (partitions) tables across nodes, enabling databases to scale horizontally on low cost, commodity hardware to serve read and write-intensive workloads, accessed both from SQL and directly via NoSQL APIs.

Can relational database be horizontal scaling? Will it be somehow based on NoSQL database?

Do someone have any real world example?

How can I manage sql requests, transactions, and so on in such database?

Telex answered 26/11, 2014 at 19:20 Comment(4)
Have a look at JetPants and see what you think. This question is too broad for the StackOverflow format.Erastus
Ultimately, all databases are NoSQL until you add SQL on top of them. I'm joking but serious. All a dbms does is organize files into quickly accessible key-value pairs.Hagride
CAP theorem is an answer.Telex
github.com/pingcap/tidb horizontally scalable relational database. It is SQL over noSQL.Twopence
S
68

It is possible but takes lots of maintenance efforts, Explanation -

Vertical Scaling of data (synonymous to Normalisation in SQL databases) is referred as splitting data column wise into multiple tables in order to reduce space redundancy. Example of user table -

enter image description here

Horizontal Scaling of data (synonymous to sharding) is referred as splitting row wise into multiple tables in order to reduce time taken to fetch data. Example of user table -

enter image description here

Key point to note here is as we can see tables in SQL databases are Normalised into multiple tables of related data. In order to shard data of such table on multiple machines, you would need to shard related normalised data accordingly which in turn would increase maintenance efforts. Like in the example presented above of SQL database,

Customer table which is related as one to many relation with Order table

If you move some rows of customer data onto other machine (referred as sharding) you would also need to move its related order data onto the same machine which would be troublesome task in case of multiple related tables.

Its convenient for NOSQL databases to shard out as they follow flat table structure (data is stored in aggregated form rather than normalised form).

Spoilfive answered 21/6, 2018 at 17:23 Comment(6)
Lets say if you have tera bytes of data. And you want different clients to stay on different database instances on different machines? How would application server come to know of it and query.(Assuming that there is never a need to query across client)Stomachic
@Jailbroken Algorithms for mapping records with machine instances are provided by database client itself which you use. For example, in NOSQL databases hashing algorithms are used (like consistent hashing), which maps record to particular machine while storing data and same algorithm is used while retrieving data. Similarly for SQL databases database client itself provides algorithm for storing and retrieving data.Spoilfive
@Spoilfive Where can I deepen the topic of sharding and mapping records/machine when it comes to RDBMS and NoSQL databases? Algorithms for mapping records with machine instances are provided by database client itself which you use... Similarly for SQL databases database client itself provides algorithm for storing and retrieving data. As an application developer, am I the one who needs to develop such mapping algorithms or does the client (e.g. MySQL DB client) offers out-of-the-box functionality? If yes, what is it? How will my query look? Will they change when introducing sharding? Thank you!Meliorism
@Meliorism I cannot say for every rdbms out there, but when I worked with shards in mysql some 7 years ago, we had to figure out the shard ID in the application code.Liberticide
@Meliorism If you wanted to join that table to another, you had to do it in code, so it involved several trips to the db. So even for the single shared table things get complicated. Now imagine subsets of the db spread accross multiple shards and all the related issues. Sharding is a last resort when other scaling options (beefing up hardware, replication) are exhausted or your table is just way too big for a single server.Liberticide
What about scaling is adding more resource to the database system: vertical scaling is adding new cores, memory to server and horizontal scale is adding new servers to the database cluster? As I understand, adding new server for as read replica is horizontal scale.Adrianneadriano
H
35

I think the answer is, unequivocally, yes. You have to keep in mind that SQL is simply a data access language. There is absolutely no reason why it can't be extended across multiple computers and network partitions. Is it a challenging problem? Most certainly, and that's why software that does it is in its infancy.

Now, I think what you are trying to ask is "Can all features that I am familiar with and that arrive in a standard SQL-type relational database management system be developed to work with multiple servers in this manner?" While I admit I haven't studied the problem in depth, there are theorems out there that say "No, it cannot." Consistency-Availability-Partition Theorem posits that we cannot have all three qualities at the same level.

Now, for all practical purposes, "sharding" or "partitioning" or whatever you want to call it is not going away; to the contrary. This means that, given the degree to which CAP theorem holds, we are going to have to shift the way we think about databases, and how we interact with them (at least, to an extent). Many developers have already made the shift necessary to be successful on a No-SQL platform, but many more have not. Ultimately, sufficient maturity of the model and effective enough workarounds will be developed that traditional SQL databases, in the sense you refer, will be more or less practical across multiple machines. This is already starting to pan out, and I would say give it a few more years and we'll be to that point. Or we'll have collectively shifted thinking to the point where it is no longer necessary, and the world will be a better place. :)

Hagride answered 27/11, 2014 at 3:59 Comment(2)
Thanks, I have never heard about CAP, while reading about that I have found usefull link about transactions for NoSQL infoq.com/articles/MarkLogic-NoSQL-with-Transactions.Telex
Probably an over simplicifaction, but I've heard the primary feature of ACID DB's that prevents horizontal scaling is multi-step transactions (eg. deduct 50 from my bank deposit into yours). If your DB system doesn't need to support these, a lot of the remaining features can still work in horizontally scaled system. Or so I heard onceEthelred
C
14

Thanks for the question and answer. I was trying to explain this to someone like this:

In terms of the CAP theorem, you can't have all three. So when a partition (network or server failure) occurs:

  • A relational database gives you C (consistency). So when a P (partition - server/network failure) occurs, you can't have A (availability - db goes down)

  • Some nosql datastores favor A when a P occurs, you can't have C (one or more of your replicated partitions will be out of sync, until the n/w comes back and they all sync up). So it will only be eventually consistent

  • As noted below in the comments by Manish, there are other nosql datastores that favor C when P occurs at the expense of A.

PS: edited #2 & added #3 to complete all scenarios. The intention behind my original answer was to provide an overly simplistic perspective on the trade offs between C, A and P. That is why I omitted #3.

Cymogene answered 21/7, 2017 at 15:37 Comment(4)
This does not seems correct. Firstly, CAP theorem is for distributed systems. So point 1 does not make sense. Point 2 is completely wrong as we have various nosql databases which does guaranty C but not A when P occurs.Rumney
I agree with @ManishBansal. However If you are considering a distributed database you must guarantee partitioning tolerance P, because if some node in the cluster fails the database service cannot be delivered. Thus CA can only be delivered in single node. Concerning that, if eventually a partition fails in P, you have two options: i) conceive availability showing the most recent wrote data that may not be consistent (eventually consistent); or ii) conceive consistency by showing an error if the most recent write cannot be guaranteed.Planking
In real life, consistency vs availability is a spectrum. You can have a totally consistent DB with rich feature set that's not scalable, up to a super fast, horizonatlly scalable key value store like Redis, which has no query features. Everything in between is tradeoffs, which is why there's so many DB products. There are trades in consistency/availablity. For example, DB may be highly available for reads but has write consistency guarantees etc...Ethelred
Statement provided above is wrong. Sql dbs have CA out of cap and for nosqls it depends on which one you are using, but generaĺly theuly are partition tolerant.Thimbleful
J
10

Google Spanner is an example of a relational database that can scale horizontally. Sharding and replication are done automatically so no need to worry about that. For more information please check out this paper.

Joellyn answered 8/11, 2019 at 14:52 Comment(0)
A
3

Yes it can. It is called NewSQL.

NewSQL is a new approach to relational databases that wants to combine transactional ACID (atomicity, consistency, isolation, durability) guarantees of good ol’ RDBMSs and the horizontal scalability of NoSQL. Source

Examples for Databases:

  • User-Shared MySQL Cluster
  • Citus (PostgreSQL extension)
  • CockroachDB
  • Azure Cosmos DB
  • Google Spanner
  • NuoDB
  • Vitess
  • Splice Machine (part of Hadoop ecosystem)
  • MemQSL (in memory store)
  • VoltDB (in memory store)

Examples for Data Warehouses:

  • IBM Netezza
  • Oracle
  • Teradata
  • Hive Engine (part of Hadoop ecosystem)
  • Spark SQL (part of Hadoop ecosystem)
Atthia answered 18/5, 2022 at 15:7 Comment(0)
E
0

Yes, but it need to migrate when storage increased.

Some open source tools can support the feature, for example: Vitess or Apache ShardingSphere.

Entice answered 14/10, 2021 at 2:18 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.