Relational vs Columnar and Document Databases - aren't they one in the same?

Asked 8/3, 2013 at 21:9 Answered 11/3, 2013 at 2:42

mongodb cassandra document-oriented-db column-oriented nosql

I understand that document-oriented NoSQL DBs are "extensions" of the KV model in that they allow you to query more than just a single lookup key. But once something is a "document", I feel like it already has a relational model baked into it:

"myJson": {
    "fizz": 4,
    "buzz": "true",
    "widget" : {
        ...etc.
    }
}

To me, I don't see the difference between this JSON, and a json_objects table with a fizz and buzz field, and a foreign key relationship to a second widgets table.

And "columnar" DB's like Cassandra just sound like straight-up relational/table DBs.

So I ask: what is so different about document- and column-oriented DBs, and so distinguishing (from RDBMSes) about them? What problems are they best suited to solve that render them superior to relational DBs under certain circumstances? Thanks in advance!

Stringfellow answered 8/3, 2013 at 21:9 Comment(0)

Firstly I'd like to say that you are very correct in saying that NoSql is different from Relational Databases and so its hard to make a comparison. With that being said there are many big distinctions between the two that can be compared.

Scaling
Although you can shard a MySql database there are issues with sharding and enforcing ACID properties when a RDMS is on multiple machines will be very challenging, NoSql solutions like Cassandra are famous for their ability to grow without problems with some cases managing 400 nodes in a cluster without a problem. Not only is it easy to grow a Cassandra database, but performance does not take a hit.

Schema(less) model.
NoSQL database systems are developed to manage large volumes of data that don't follow a fixed schema. This means that for example you wish to add a new column to an existing column family in Cassandra you don't need to go back and amend the column family so no need for this:

ALTER TABLE table_name ALTER COLUMN column_name datatype;

We can instead just add new columns as we go, and might end up with the following 'table':

 key         | follower1  | follower2   | follower2          
-------------+------------+-------------+-----------
 lyubent     | joeb       | chuckn      | gordonf     
 chuckn      | joeb       | gordonf                   
 gordonf     | chuckn                                 
 joeb        | chuckn     | lyubent     | joeb

This allows data models to be flexible and easily extended but in doing so data becomes less structured.

Speed
NoSql databases are optimized for high write speeds while the RDBMs' aim for high read speeds. But even with that in mind NoSql solutions still tend to outperform RDBMs systems when it comes to reads. This is because the NoSql databases don't implement many of the functions that slow down read/write/update operations in the Relational Model like for example ACID properties and transactions.

When should it be used?

Your application/website will need to grow rapidly but you want to start off small.
You're more concerned with writing data than reading it back. (Lots of tweets are posted but not all of them are read)
Availability of your system is more important that data being 100% updated. (So if you are a bank, you don't want NoSql but if you are a website that needs 100% uptime it could be a good choice)
If the data being written needs to succeed 100% of the time, but eventual consistency isn't a problem.

Just for a visual illustration, this helped me out a lot in understanding where the different sql solutions fit into the database world and how each fits a purpose.

Database Triad - Availability, Consistency and Partition Tolerance

Fusion answered 8/3, 2013 at 22:45 Comment(5)

That diagram is completely wrong, you cannot have CA db. It cannot have A if it is not partition tolerant. That diagram was made by someone who misunderstood CAP theorem. You cannot pick 2, you need to choose between C or A. codahale.com/you-cant-sacrifice-partition-tolerance That link was twitted by Brewer (author of CAP theorem). Just think about it, what CAP property does distributed MySql (sharded( have that HBase doesn't have? Show me a scenario where MySql has availability and HBase does not. – Erectile 10/3, 2013 at 2:43

RDBMS systems guarantee consistency and sharding makes the system tolerant to partitioning. From the theorem follows that the system can therefor not guarantee availability so RDBMS systems are CP. – Erectile 10/3, 2013 at 2:49

@Erectile Critique is always appreciated, however you say the diagram is completely wrong because of where HBase and MySql stand on the diagram. You overlooked the rest. The image has been used in a number of answers accross SO and please have a read through this article justifying why MySql was placed as CA, or if you dont want to... they are there for a comparison, it's a guide to NoSql databases, not RDBMs'. – Fusion 10/3, 2013 at 13:14

I know about that article which is wrong. It says that traditional RDBMSs like Postgres, MySQL are CA which is not true. They don't have A property. I also know that in many answers people use this wrong diagram. There are many blog posts on the internet which interpret CAP theorem as "pick any 2" which is wrong and because of that Brewer said that he will write a new paper to clarify why that is not true. He never said in his paper "pick any two", that was just a wrong interpretation. – Erectile 10/3, 2013 at 23:8

Also this was just a friendly critique which aims to clarify a minor point in your post. I gave you +1 for everything else that you wrote. – Erectile 10/3, 2013 at 23:14

In no schema db you don't have fixed columns and types.

For example product 'Jeans' can have attributes 'price', 'length' and 'model' (M/W) but for product book you have attributes 'price', 'authors' and 'title'. For mobile phones you will have 'screen type', 'operating system' etc.

It is very difficult to model that in RDBMS because you are not flexible and user cannot insert arbitrary attributes so it is easier to use a document database which are optimized for this kind of data so that you can easily search and filter by value on arbitrary attributes (eg. all products with length>30 and model=w).

Erectile answered 11/3, 2013 at 2:42 Comment(0)

Recommended topics

Hot tags