Why NoSQL is better at "scaling out" than RDBMS?
Asked Answered
K

8

91

I have read the following text in a technical blog discussing the advantages and disadvantages of NoSQL:

"For years, in order to improve performance on database servers, database administrators have had to buy bigger servers as the database load increases (scaling up) instead of distributing the database across multiple “hosts” as the load increases (scaling out). RDBMS do not typically scale out easily, but the newer NoSQL databases are actually designed to expand easily to take advantage of new nodes and are usually designed with low-cost commodity hardware in mind."

I became confused about the scalability of RDBMS and NoSQL.

My confusion are:

  1. Why RDBMS are less able to scale out? And the reason of buying bigger servers instead of buying more cheap ones.
  2. Why NoSQL is more able to scale out?
Kenti answered 4/1, 2012 at 15:56 Comment(0)
B
71

RDBMS have ACID ( http://en.wikipedia.org/wiki/ACID ) and supports transactions. Scaling "out" with RDBMS is harder to implement due to these concepts.

NoSQL solutions usually offer record-level atomicity, but cannot guarantee a series of operations will succeed (transaction).

It comes down to: to keep data integrity and support transactions, a multi-server RDBMS would need to have a fast backend communication channel to synchronize all possible transactions and writes, while preventing/handling deadlock.

This is why you usually only see 1 master (writer) and multiple slaves (readers).

Blairblaire answered 4/1, 2012 at 16:1 Comment(7)
RavenDB supports transactions, though not in the traditional sense.Americano
Thanks, it makes sense to me. May I ask if lack of support of transaction is a disadvantage of NoSQL? And is there any case that support of transaction is not that important or little of use so that this lack of support is not that a disadvantage?Kenti
It would be a disavantage if you needed it :( NoSql versus sql is a tradeoff ease of scalability, versus ease of transaction management. So if you say I need transactions and go sql, scalability just got harder, if you go nosql and then want intensive transaction support, life is going to be tuff.Kellner
You can always use an hybrid system. ex: ecommerce. RDBMS for the actual accounting part, mongodb for the rest.Blairblaire
"To keep data integrity and support transactions, a multi-server RDMBS would need to have a fast backend communication channel to synchronize all possible transactions and writes, while preventing/handling deadlock." <-- Exactly, the Clustrix database is has solved this by leveraging Infiniband as the fast interconnect and MVCC to handle how to scale writes and is ACID. Scale Out RDBMS solutions are sometimes referred to as NewSQLSinusoid
@Sinusoid I don't think you're allowed to advertise in stack comments. Apart from that, SQL databases have been scaled out since the beginning of databases, and the only difference is that they were meant to be configured and installed by professionals rather than tourists, as is the goal with the noSQL-type of offerings.Agonize
I would also add that the fact that NoSQL solutions like MongoDB for example, use the Document Data Model instead of the Tabular Data Model used by traditional RDBMS', means that that particular record attributes (I. columns/fields) can be wholly stored "together". In a relational database you have to scavenge multiple tables to put together the entire view of a record. In essence NoSQL stores data vertically as opposed to horizontally. This means that fetch time can be reduced considerable, if records can be hashed to a particular partition. Think Java ConcurrentHashMap but extrapolated.Cytologist
F
103

So I've been trying to figure out the real bottom-line when it comes to NoSQL vs RDBMS myself, and always end up with a response that doesn't quite cut it. In my search there are really 2 primary differences between NoSQL and SQL, with only 1 being a true advantage.

  1. ACID vs BASE - NoSQL typically leaves out some of the ACID features of SQL, sort of 'cheating' it's way to higher performance by leaving this layer of abstraction to the programmer. This has already been covered by previous posters.

  2. Horizontal Scaling - The real advantage of NoSQL is horizontal scaling, aka sharding. Considering NoSQL 'documents' are sort of a 'self-contained' object, objects can be on different servers without worrying about joining rows from multiple servers, as is the case with the relational model.

Let's say we want to return an object like this:

post {
    id: 1
    title: 'My post'
    content: 'The content'
    comments: {
      comment: {
        id: 1
      }
      comment: {
        id: 2
      }
      ...

    views: {
      view: {
        user: 1
      }
      view: {
        user: 2
      }
      ...
    }
}

In NoSQL, that object would basically be stored as is, and therefore can reside on a single server as a sort of self-contained object, without any need to join with data from other tables that could reside on other DB servers.

However, with Relational DBs, the post would need to join with comments from the comments table, as well as views from the views table. This wouldn't be a problem in SQL ~UNTIL~ the DB is broken into shards, in which case 'comment 1' could be on one DB server, while 'comment 2' yet on another DB server. This makes it much more difficult to create the very same object in a RDBMS that has been scaled horizontally than in a NoSQL DB.

Would any DB experts out there confirm or argue these points?

Fete answered 3/2, 2014 at 23:4 Comment(3)
What if there is a single table for holding posts data including comments,views in RDBMS?Ciaracibber
Yes, de-normalizing the database it's a possible workaround for join-performance issues, obviously at the cost of any data-denormalization (redundancy, updates costs, size, etc). Which by the way, it's the hole idea of an aggregate-oriented noSQL solutions like key-value, column oriented, and document.Washday
So if we de-normalize the RDBMS will we get the same performance as NO-SQL?Certificate
B
71

RDBMS have ACID ( http://en.wikipedia.org/wiki/ACID ) and supports transactions. Scaling "out" with RDBMS is harder to implement due to these concepts.

NoSQL solutions usually offer record-level atomicity, but cannot guarantee a series of operations will succeed (transaction).

It comes down to: to keep data integrity and support transactions, a multi-server RDBMS would need to have a fast backend communication channel to synchronize all possible transactions and writes, while preventing/handling deadlock.

This is why you usually only see 1 master (writer) and multiple slaves (readers).

Blairblaire answered 4/1, 2012 at 16:1 Comment(7)
RavenDB supports transactions, though not in the traditional sense.Americano
Thanks, it makes sense to me. May I ask if lack of support of transaction is a disadvantage of NoSQL? And is there any case that support of transaction is not that important or little of use so that this lack of support is not that a disadvantage?Kenti
It would be a disavantage if you needed it :( NoSql versus sql is a tradeoff ease of scalability, versus ease of transaction management. So if you say I need transactions and go sql, scalability just got harder, if you go nosql and then want intensive transaction support, life is going to be tuff.Kellner
You can always use an hybrid system. ex: ecommerce. RDBMS for the actual accounting part, mongodb for the rest.Blairblaire
"To keep data integrity and support transactions, a multi-server RDMBS would need to have a fast backend communication channel to synchronize all possible transactions and writes, while preventing/handling deadlock." <-- Exactly, the Clustrix database is has solved this by leveraging Infiniband as the fast interconnect and MVCC to handle how to scale writes and is ACID. Scale Out RDBMS solutions are sometimes referred to as NewSQLSinusoid
@Sinusoid I don't think you're allowed to advertise in stack comments. Apart from that, SQL databases have been scaled out since the beginning of databases, and the only difference is that they were meant to be configured and installed by professionals rather than tourists, as is the goal with the noSQL-type of offerings.Agonize
I would also add that the fact that NoSQL solutions like MongoDB for example, use the Document Data Model instead of the Tabular Data Model used by traditional RDBMS', means that that particular record attributes (I. columns/fields) can be wholly stored "together". In a relational database you have to scavenge multiple tables to put together the entire view of a record. In essence NoSQL stores data vertically as opposed to horizontally. This means that fetch time can be reduced considerable, if records can be hashed to a particular partition. Think Java ConcurrentHashMap but extrapolated.Cytologist
T
18

Typical RDBMs make strong guaranties about consistency. This requires to some extent communication between nodes for every transaction. This limits the ability to scale out, because more nodes means more communications.

NoSQL systems make different tradeoffs. For example they don't guarantee that a second session will immediately see the data committed by the first session. Thereby decoupling the transaction of storing some data from the process of making that data available for every user. Google "eventually consistent". So a single transaction doesn't need to wait for any (or for much less) inter node communication. Therefore they are able to utilize a large amount of nodes much more easily.

Thorstein answered 4/1, 2012 at 16:3 Comment(1)
Such different trade-offs can also be configured in RDBMS systems, but not many people know this. See: tqdev.com/2016-trading-durability-for-performance-without-nosqlHylozoism
A
15

Why NoSQL databases can be easily horizontally scaled than SQL ones? I have been trying to figure out why people keep saying this. I came across many articles which only confused me with their not-industry familiar terminologies and vague assumptions. I will suggest you read Designing Data-intensive applications by Martin Kleppman. Also, I will share some of my understanding of this subject.

JOINS - in the case of many-to-one or many-to-many relationships there is no way that any database invented till now can keep the data together in one table or document so if the data is sharded(or partitioned), either it is SQL or NoSQL, the latency will be same, the database has to look for both the documents. NoSQL seems to dominate only in the case of one to many relationships. For example:

NoSql

Student

{
  "name": "manvendra",
  "education": [
    {
      "id": 1,
      "Degree": "High School"
    },
    {
      "id": 2,
      "Degree": "B.Tech"
    }
  ]
}

Eduction Institute collection

[
  {
    "id": "1",
    "name": "army public school"
  },
  {
    "id": "2",
    "name": "ABES Engineering College"
  }
]

Sql

Student Table

id | name        
1  | Manvendra

Education Institute

id | Name
1  | Army public school
2  | ABES Engineering college

Studies Table

student  | education institute | degree
1        | 1                   | high school
1        | 2                   | B.tech

Now suppose in the case of NoSql if both collection's data is on different nodes there will some extra time required to resolve the ids of the education institute and this situation is similar in the case of SQL databases so where is the benefit? I can't think of any.

Also, you must be thinking why can't we store the education institute info also in the same student collection, then it will be like:

{
  "name": "manvendra",
  "education": [
    {
      "name": "Army public school",
      "Degree": "High School"
    },
    {
      "name": "ABES Engineering College",
      "Degree": "B.Tech"
    }
  ]
}

which is really a bad design because there is a many-to-many relationship between student and education institute, many students might have studied from the same institute so tomorrow if there is a change in name or any information of the institute it will be really a very difficult challenge to change at all places.

However, in the case of a one-to-many relationship, we can club all the info together for example: Consider a customer and an order relationship

{
  "name": "manvendra",
  "order": [
    {
      "item": "kindle",
      "price": "7999"
    },
    {
      "item":"iphone 12",
      "price":"too much"
    }
  ]
}

Since an order only belongs to one customer it makes sense to store order info in one place however storing item id or name is another choice anyway, if we use SQL database here, there will be two tables with orders and customers which will not give good results to queries if data is not stored in the same node.

So saying joins in an argument as to why the NoSql database is easier to scale horizontally does not make sense.

TRANSACTIONS

Both SQL(Postgres, MySQL, etc) and NoSQL(MongoDB, Amazon's DynamoDB, etc) support transactions so there is nothing left to discuss on that.

ACID

ACID is overused just like CAP actually it is all about showing a single copy of data to the client instead actually there might be multiple copies of data(to enhance availability, fault-tolerance, etc) and what strategies the database uses to do that. For example in Postgres in the case of a master-slave distributed system, one can opt for synchronous or asynchronous replication and the replication is made possible with WAL(Write ahead logs) and same is the case in MongoDB, only in place of WAL it has oplog(Operations Log), both support streaming replication and failovers. Then where is the difference? Actually, I can't find a very strong reason that why NoSql databases can be scaled easily. What I can say is NoSql is the latest so databases come with ready-made support for horizontal scaling for example consider Mongos in MongoDB, they do all the dirty work of sharding documents, routing requests to the specific shard, etc. So tomorrow if Postgres or MySQL come up with some mechanism of intelligently sharding tables so all the related data is mostly kept in one node then it may put an end to this debate because there is nothing intrinsic in a relational database that prevents it from horizontal scaling.

On an optimistic note, I believe in the near future it will all be about the strategies. How you are planning to scale and those strategies will be independent of how you are storing data either in tables or documents. For example in Amazon's DocumentDB, there is a concept of auto-scaling in and out but if you want to achieve this with sharding it will be a burden to copy data each time you are scaling in and out. In DocumentDB this is taken care of as a shared cluster volume(data storage is separated from computing) which is nothing but a shared disk to all the instances(primary or secondary) and to escape from the risk of the shared disk failure DocumentDB replicates data of the shared disk to six other disks in different availability zones. So point to be noted here is DocumentDB mixed the concept of the shared disk and standard replication strategy to achieve its goal. So it is all about the strategy you are using in your database which is what matters

Aqaba answered 18/2, 2021 at 18:41 Comment(2)
In this case, you really do not understand about "denormalization" or "preSQL" what you called "bad design" is actually a good design in many cases in No SQL worldEarley
@gaesae I called it bad design for the fact that it is not easy to change anything in that type of design. One single change has to be incorporated at multiple places if normalisation is not done properly as per the example I mentionedAqaba
M
3

In RDBMS when the data becomes huge then it may happen that tables are spread across multiple systems and in that case performing operations like JOIN are very slow.

In case of NoSQL in general related data are stored together on same machine (either in single document - in document oriented databases or in case of Wide column datastore the related columns are on same machine). Hence its easy to scale out on a number of low end machines, obviously in this case there will be duplicate data in multiple places which is not the case in RDBMS

Maltha answered 19/2, 2019 at 11:0 Comment(0)
H
0

@jessedrelick mentioned sharding in their answer, and I want to dive into that a bit – because that's the piece ultimately made this click for me. (And MongoDB has a page that explains this really well.)

Let's take an extreme example and oversimplify it, because I think that makes it easier to understand. Imagine you're designing a database that holds every post on Instagram.

In a relational database, you'd have a posts table. Works pretty well to start! But as more and more people use Instagram, the number of posts grows... quickly. You end up needing to scale up to a bigger server in order to have space for all those new posts and accommodate all the traffic trying to access them. (Scaling's costly, especially when you consider that your traffic load won't always be the same, and you don't want to pay for resources you're not using 95% of the time.)

And beyond that, these days, many millions (or even billions) of posts are created every day on Instagram. You'll reach a point where you'll need some strategy to distribute both the data and requests (scaling out).

That's where something like ranged sharding comes into play, where your system would chunk data onto different individual nodes and have a lookup table that points you to the right ones based on your needs. And since smaller ("commodity") nodes are cheaper than huge, enterprise-level instances, it can have some cost advantages too.

If you're looking to harness some of the big advantages of relational databases, though – like joining across tables – it can get, let's say, tricky to work with data that spans different servers.

Now, I don't really think there's anything stopping you from implementing a ranged sharding lookup system like that with an RDBMS. But at that point, you've lost some key advantages that come with the relational structure (again, like joining). And it might be worth re-analyzing whether that changes whether or not you'd choose RDBMS or NoSQL.

Hamal answered 3/8, 2023 at 17:7 Comment(0)
S
0

Some RDBMS had the ability 'scale out' way before the NoSQL hype. There is Oracle's Real Applications Cluster and IBM's PureScale.

Seel answered 6/6 at 10:0 Comment(0)
M
-1

For a NO SQL, 1.All the child related to a collection is at the same place and so on same server and there is no join operation to lookup data from another server .

2.There is no schema so no Locks needed on any server and the transaction handling is left to the clients .

The above 2 saves a lot of overhead of scaling in NO-SQL.

Macswan answered 12/7, 2016 at 10:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.