Example of a task that a NoSQL database can't handle (if any)
Asked Answered
A

6

14

I would like to test the NoSQL world. This is just curiosity, not an absolute need (yet). I have read a few things about the differences between SQL and NoSQL databases. I'm convinced about the potential advantages, but I'm a little worried about cases where NoSQL is not applicable. If I understand NoSQL databases essentially miss ACID properties.

Can someone give an example of some real world operation (for example an e-commerce site, or a scientific application, or...) that an ACID relational database can handle but where a NoSQL database could fail miserably, either systematically with some kind of race condition or because of a power outage, etc ?

The perfect example will be something where there can't be any workaround without modifying the database engine. Examples where a NoSQL database just performs poorly will eventually be another question, but here I would like to see when theoretically we just can't use such technology.

Maybe finding such an example is database specific. If this is the case, let's take MongoDB to represent the NoSQL world.

Edit: to clarify this question I don't want a debate about which kind of database is better for certain cases. I want to know if this technology can be an absolute dead-end in some cases because no matter how hard we try some kind of features that a SQL database provide cannot be implemented on top of nosql stores. Since there are many nosql stores available I can accept to pick an existing nosql store as a support but what interest me most is the minimum subset of features a store should provide to be able to implement higher level features (like can transactions be implemented with a store that don't provide X...).

Apocalyptic answered 25/3, 2011 at 21:50 Comment(2)
Try writing something like a general ledger (book keeping) app in any of the NoSQL crowd... not well suited.Crosspiece
This question was about database theory, with concrete applications (to know when there it's not worth trying to find a solution to a real-world problem if it's proven that there is no solution). It's definitely about facts, and not a subjective debate about which technology is betterApocalyptic
H
19

This question is a bit like asking what kind of program cannot be written in an imperative/functional language. Any Turing-complete language and express every program that can be solved by a Turing Maching. The question is do you as a programmer really want to write a accounting system for a fortune 500 company in non-portable machine instructions.

In the end, NoSQL can do anything SQL based engines can, the difference is you as a programmer may be responsible for logic in something Like Redis that MySQL gives you for free. SQL databases take a very conservative view of data integrity. The NoSQL movement relaxes those standards to gain better scalability, and to make tasks that are common to Web Applications easier.

MongoDB (my current preference) makes replication and sharding (horizontal scaling) easy, inserts very fast and drops the requirement for a strict scheme. In exchange users of MongoDB must code around slower queries when an index is not present, implement transactional logic in the app (perhaps with three phase commits), and we take a hit on storage efficiency.

CouchDB has similar trade-offs but also sacrifices ad-hoc queries for the ability to work with data off-line then sync with a server.

Redis and other key value stores require the programmer to write much of the index and join logic that is built in to SQL databases. In exchange an application can leverage domain knowledge about its data to make indexes and joins more efficient then the general solution the SQL would require. Redis also require all data to fit in RAM but in exchange gives performance on par with Memcache.

In the end you really can do everything MySQL or Postgres do with nothing more then the OS file system commands (after all that is how the people that wrote these database engines did it). It all comes down to what you want the data store to do for you and what you are willing to give up in return.

Howund answered 26/3, 2011 at 5:47 Comment(4)
Well said, John. That's a point I didn't clearly call out. Data integrity, strong-typing, relational integrity, security and query interface are all things that have to be done in the app layer in most cases, if you need them.Treenatreenail
@Jonh F. Miller: considering the three-phase commit you mentioned, can applications implement this though the database (ie write someting in some table to indicate a new transaction, etc.) or do they need to communicate independently (for example with tcp, without using MongoDB) before talking to the database engine ?Apocalyptic
@ascobol: It depends on the transaction, but in general, no communication is needed outside of the database.Howund
@Jonh F. Miller: thanks. I have found here an example of a two phase commit. Indeed no comminication is needed between apps. mongodb.org/display/DOCS/two-phase+commit Do you have links for a three phase commit ?Apocalyptic
T
11

Good question. First a clarification. While the field of relational stores is held together by a rather solid foundation of principles, with each vendor choosing to add value in features or pricing, the non-relational (nosql) field is far more heterogeneous.

There are document stores (MongoDB, CouchDB) which are great for content management and similar situations where you have a flat set of variable attributes that you want to build around a topic. Take site-customization. Using a document store to manage custom attributes that define the way a user wants to see his/her page is well suited to the platform. Despite their marketing hype, these stores don't tend to scale into terabytes that well. It can be done, but it's not ideal. MongoDB has a lot of features found in relational databases, such as dynamic indexes (up to 40 per collection/table). CouchDB is built to be absolutely recoverable in the event of failure.

There are key/value stores (Cassandra, HBase...) that are great for highly-distributed storage. Cassandra for low-latency, HBase for higher-latency. The trick with these is that you have to define your query needs before you start putting data in. They're not efficient for dynamic queries against any attribute. For instance, if you are building a customer event logging service, you'd want to set your key on the customer's unique attribute. From there, you could push various log structures into your store and retrieve all logs by customer key on demand. It would be far more expensive, however, to try to go through the logs looking for log events where the type was "failure" unless you decided to make that your secondary key. One other thing: The last time I looked at Cassandra, you couldn't run regexp inside the M/R query. Means that, if you wanted to look for patterns in a field, you'd have to pull all instances of that field and then run it through a regexp to find the tuples you wanted.

Graph databases are very different from the two above. Relations between items(objects, tuples, elements) are fluid. They don't scale into terabytes, but that's not what they are designed for. They are great for asking questions like "hey, how many of my users lik the color green? Of those, how many live in California?" With a relational database, you would have a static structure. With a graph database (I'm oversimplifying, of course), you have attributes and objects. You connect them as makes sense, without schema enforcement.

I wouldn't put anything critical into a non-relational store. Commerce, for instance, where you want guarantees that a transaction is complete before delivering the product. You want guaranteed integrity (or at least the best chance of guaranteed integrity). If a user loses his/her site-customization settings, no big deal. If you lose a commerce transation, big deal. There may be some who disagree.

I also wouldn't put complex structures into any of the above non-relational stores. They don't do joins well at-scale. And, that's okay because it's not the way they're supposed to work. Where you might put an identity for address_type into a customer_address table in a relational system, you would want to embed the address_type information in a customer tuple stored in a document or key/value. Data efficiency is not the domain of the document or key/value store. The point is distribution and pure speed. The sacrifice is footprint.

There are other subtypes of the family of stores labeled as "nosql" that I haven't covered here. There are a ton (122 at last count) different projects focused on non-relational solutions to data problems of various types. Riak is yet another one that I keep hearing about and can't wait to try out.

And here's the trick. The big-dollar relational vendors have been watching and chances are, they're all building or planning to build their own non-relational solutions to tie in with their products. Over the next couple years, if not sooner, we'll see the movement mature, large companies buy up the best of breed and relational vendors start offering integrated solutions, for those that haven't already.

It's an extremely exciting time to work in the field of data management. You should try a few of these out. You can download Couch or Mongo and have them up and running in minutes. HBase is a bit harder.

In any case, I hope I've informed without confusing, that I have enlightened without significant bias or error.

Treenatreenail answered 26/3, 2011 at 6:23 Comment(0)
L
9

RDBMSes are good at joins, NoSQL engines usually aren't. NoSQL engines is good at distributed scalability, RDBMSes usually aren't.

RDBMSes are good at data validation coinstraints, NoSQL engines usually aren't. NoSQL engines are good at flexible and schema-less approaches, RDBMSes usually aren't.

Both approaches can solve either set of problems; the difference is in efficiency.

Loculus answered 26/3, 2011 at 7:50 Comment(0)
N
2

Probably answer to your question is that mongodb can handle any task (and sql too). But in some cases better to choose mongodb, in others sql database. About advantages and disadvantages you can read here.

Also as @Dmitry said mongodb open door for easy horizontal and vertical scaling with replication & sharding.

Nevernever answered 25/3, 2011 at 23:15 Comment(0)
M
1

RDBMS enforce strong consistency while most no-sql are eventual consistent. So at a given point in time when data is read from a no-sql DB it might not represent the most up-to-date copy of that data.

A common example is a bank transaction, when a user withdraw money, node A is updated with this event, if at the same time node B is queried for this user's balance, it can return an outdated balance. This can't happen in RDBMS as the consistency attribute guarantees that data is updated before it can be read.

Merlynmermaid answered 25/3, 2011 at 23:31 Comment(1)
You are correct for a lot of NoSQL technologies but MongoDB in particular offers strong consistency semantics if you want it to. You can also use MongoDB in a cluster setup for eventual consistency. The user has full control, and can use it either way depending on use-case.Expanded
S
1

RDBMs are really good for quickly aggregating sums, averages, etc. from tables. e.g. SELECT SUM(x) FROM y WHERE z. It's something that is surprisingly hard to do in most NoSQL databases, if you want an answer at once. Some NoSQL stores provide map/reduce as a way of solving the same thing, but it is not real time in the same way it is in the SQL world.

Salem answered 26/3, 2011 at 5:53 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.