The more I read about NoSQL, the more it begins to sound like a column oriented database to me.
What's the difference between NoSQL (e.g. CouchDB, Cassandra, MongoDB) and a column oriented database (e.g. Vertica, MonetDB)?
The more I read about NoSQL, the more it begins to sound like a column oriented database to me.
What's the difference between NoSQL (e.g. CouchDB, Cassandra, MongoDB) and a column oriented database (e.g. Vertica, MonetDB)?
NoSQL is term used for Not Only SQL, which covers four major categories - Key-Value, Document, Column Family and Graph databases.
Key-value databases are well-suited to applications that have frequent small reads and writes along with simple data models. These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database.
e.g. Redis, Riak etc.
Document databases have ability to store varying attributes along with large amounts of data
e.g. MongoDB , CouchDB etc.
Column family databases are designed for large volumes of data, read and write performance, and high availability
e.g Cassandra, HBase etc.
Graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data
e.g Neo4j, InfiniteGraph etc.
Before understanding NoSQL, you have to understand some key concepts.
Consistency – All the servers in the system will have the same data
Availability – The system will always respond to a request (Data may be stale or outdated. But it will be corrected eventually) .
Partition Tolerance – The system continues to operate as a whole even if individual servers fail or can't be reached.
Most of the times, only two out above three properties will be satisfied by NoSQL databases.
From your question,
CouchDB : AP ( Availability & Partition) & Document database
Cassandra : AP ( Availability & Partition) & Column family database
MongoDB : CP ( Consistency & Partition) & Document database
Vertica : CA ( Consistency & Availability) & Column family database
MonetDB : ACID (Atomicity Consistency Isolation Durability) & Relational database
Some NoSQL databases are column-oriented databases, and some SQL databases are column-oriented as well. Whether the database is column or row-oriented is a physical storage implementation detail of the database and can be true of both relational and non-relational (NoSQL) databases.
Vertica, for example, is a column-oriented relational database so it wouldn't actually qualify as a NoSQL datastore.
A "NoSQL movement" datastore is better defined as being non-relational, shared-nothing, horizontally scalable database without (necessarily) ACID guarantees. Some column-oriented databases can be characterized this way. Besides column stores, NoSQL implementations also include document stores, object stores, tuple stores, and graph stores.
A NoSQL Database is a different paradigm from traditional schema based databases. They are designed to scale and hold documents like json data. Obviously they have a way of querying information, but you should expect syntax like eval("person = * and age > 10) for retrieving data. Even if they support standard SQL interface, they are intended for something else, so if you like SQL you should stick to traditional databases.
A column-oriented database is different from traditional row-oriented databases because of how they store data. By storing a whole column together instead of a row, you can minimize disk access when selecting a few columns from a row containing many columns. In row-oriented databases there's no difference if you select just one or all fields from a row.
You have to pay for a more expensive insert though. Inserting a new row will cause many disk operations, depending on the number of columns.
But there's no difference with traditional databases in terms of SQL, ACID, foreign keys and stuff like that.
I would suggest reading the taxonomy section of the NoSQL wikipedia entry to get a feel for just how different NoSQL databases are from a traditional schema-oriented database. Being column-oriented implies rows and columns, which implies a (two dimensional) schema, while NoSQL databases tend to be schema-less (key-value stores) or have structured contents but without a formal schema (document stores).
For document stores, the structure and contents of each "document" are independent of other documents in the same "collection". Adding a field is usually a code change rather than a database change: new documents get an entry for the new field, while older documents are considered to have a null value for the non-existent field. Similarly, "removing" a field could mean that you simply stop referring to it in your code rather than going to the trouble of deleting it from each document (unless space is at a premium, and then you have the option of removing only those with the largest contents). Contrast this to how an entire table must be changed to add or remove a column in a traditional row/column database.
Documents can also hold lists as well as other nested documents. Here's a sample document from MongoDB (a post from a blog or other forum), represented as JSON:
{
_id : ObjectId("4e77bb3b8a3e000000004f7a"),
when : Date("2011-09-19T02:10:11.3Z"),
author : "alex",
title : "No Free Lunch",
text : "This is the text of the post. It could be very long.",
tags : [ "business", "ramblings" ],
votes : 5,
voters : [ "jane", "joe", "spencer", "phyllis", "li" ],
comments : [
{ who : "jane", when : Date("2011-09-19T04:00:10.112Z"),
comment : "I agree." },
{ who : "meghan", when : Date("2011-09-20T14:36:06.958Z"),
comment : "You must be joking. etc etc ..." }
]
}
Note how "comments" is a list of nested documents with their own independent structure. Queries can "reach into" these documents from the outer document, for example to find posts that have comments by Jane, or posts with comments from a certain date range.
So in short, two of the major differences typical of NoSQL databases are the lack of a (formal) schema and contents that go beyond the two dimensional orientation of a traditional row/column database.
Here is how I see it: Column Oriented databases are dealing with the way data is physically stored on disk. As the name suggests, the each column is stored in its own separate space/file. This allows for 2 important things:
NoSQL on the other hand are a whole new breed of databases that define "logical" aggregate levels to explain the data. Some treat the data as having hierachical relationship (aggregate being a "node"), while the other treat the data as documents (which is the aggregate level). They do not dictate the physical storage strategy (some may do, but abstracted away from the end user).
Also, the whole NoSQL movement is more to do with unstructured data, or rather data sets whose schema cannot be predefined, or in unknown beforehand, and therefore cannot conform to the strict relational model.
Column Oriented databases still deal with relational data, although eliminate the need for index etc.
© 2022 - 2024 — McMap. All rights reserved.