Why should I use document based database instead of relational database?
Asked Answered
P

9

215

Why should I use document based database like CouchDB instead of using relational database. Are there any typical kinds of applications or domains where the document based database is more suitable than the relational database?

Pterosaur answered 14/1, 2009 at 0:21 Comment(1)
Perhaps a document-oriented database might be similar in some ways to a "entity-attribute-value" (EAV) database.Elocution
O
194

Probably you shouldn't :-)

The second most obvious answer is you should use it if your data isn't relational. This usually manifests itself in having no easy way to describe your data as a set of columns. A good example is a database where you actually store paper documents, e.g. by scanning office mail. The data is the scanned PDF and you have some meta data which always exists (scanned at, scanned by, type of document) and lots of possible metadata fields which exists sometime (customer number, supplier number, order number, keep on file until, OCRed fulltext, etc). Usually you do not know in advance which metadata fields you will add within the next two years. Things like CouchDB work much nicer for that kind of data than relational databases.

I also personally love the fact that I don't need any client libraries for CouchDB except an HTTP client, which is nowadays included in nearly every programming language.

The probably least obvious answer: If you feel no pain using a RDBMS, stay with it. If you always have to work around your RDBMS to get your job done, a document oriented database might be worth a look.

For a more elaborate list check this posting of Richard Jones.

Oster answered 20/1, 2009 at 22:21 Comment(2)
I have never seen any database schema in two years time resemble the original schema we started with... so everything equal (which it isn't...), you should always use a schemaless database = a document-oriented one; which i think is a rather misleading name...Pygidium
@int3 If you can't describe your data as a set of columns how are you supposed to write intelligent queries on said data?Untangle
W
47

From CouchDB documentation (https://web.archive.org/web/20090122111651/http://couchdb.apache.org/docs/overview.html):

  • "A document database server, accessible via a RESTful JSON API." Generally, relational databases aren't simply accessed via REST services, but require a much more complex SQL API. Often these API's (JDBC, ODBC, etc.) are quite complex. REST is quite simple.

  • Ad-hoc and schema-free with a flat address space. Relational databases have complex, fixed schema. You define tables, columns, indexes, sequences, views and other stuff. Couch doesn't require this level of complex, expensive, fragile advanced planning.

  • Distributed, featuring robust, incremental replication with bi-directional conflict detection and management. Some SQL commercial products offer this. Because of the SQL API and the fixed schemas, this is complex, difficult and expensive. For Couch, it appears simple and inexpensive.

  • Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language. So does SQL and relational databases. Nothing new here.

So. Why CouchDB?

  • REST is simpler than JDBC or ODBC.
  • No Schema is simpler than Schema.
  • Distributed in a way that appears simple and inexpensive.
Woosley answered 14/1, 2009 at 0:36 Comment(6)
While I am a big fan of NoSQL databases, the first claim (REST is simpler than JDBC) is very dubious.Pygidium
The REST protocol seems pretty simple to me, since it's just HTTP: stateless, few methods, etc., etc. Perhaps JDBC is (under the hood) simple; it doesn't seem to be simpler, based merely on being stateful.Woosley
@Woosley Shouldn't the answer be more "generic" instead of geared towards CouchDb only?Chick
"fragile advanced planning" vs what? In my experience the alternative is no-planning which leads to spaghetti data structures that are modified on a whim.Ruphina
@Woosley good example why it's not so clear that it is simpler, cause people always forget half of what it entails and everyone has a slightly different understanding of what REST actually is... ;)Fez
That interface comparison (JDBC vs REST) also seems like a somewhat arbitrary focus - if one works with a SQL database, one might never actually deal with the JDBC layer details... and both SQL and REST are pretty common interfaces. With SQL being tailored to data and REST being more generic, to me it also feels like SQL is way more natural for data querying than having to deal with an intermediary transport layer...Fez
M
33

For stupidly storing and serving other-servers-data.

In the last couple of weeks I've been playing with a lifestream app that polls my feeds (delicious, flickr, github, twitter...) and stores them in couchdb. The beauty of couchdb is that it lets me keep the original data in its original structure with no overhead. I added a 'class' field to each document, storing the source server, and wrote a javascript render class for each source.

Generalizing, whenever your server communicates with another server a schema-less storage is best as you have no control over the schema. As a bonus, couchdb uses the native protocols of servers and clients - JSON for representation and HTTP REST for transport.

Millepede answered 15/2, 2009 at 9:47 Comment(6)
Why not just store them in a file, or a file per feed?Perseid
because couchdb also lets you create interesting views using map/reduce. For example, I can create a view based on the data source, or I can calculate the totals for each source.Millepede
That's a brilliant point... if you're consuming data and have no control over the inbound data schema -- use a document store.Minatory
This is the first really convincing argument I've heard for the value of NoSQL databasesAccelerator
This is the single best argument for document based databases I've heard thus far.Dedra
I think that can be generalized more to: 3rd party data, be it user provided data that is not predetermined or data imported to make it searchable. Whenever the DB is not managing primary application data (data that the application itself needs to function) and that can vary in type and form, a no-schema database might be the best choice as you will find it hard to squeeze it into a RDBMS and/or will loose the main benefits such a DB offers.Fez
T
24

Rapid application development comes to mind.

When I am constantly evolving my schema, I am constantly frustrated by having to maintain the schema in MySQL/SQLite. While I've not done too much with CouchDB yet, I do like how simple it is to evolve the schema during the RAD process.

A case where you might not want to use a non-relational database is when you have a lot of many-to-many relationships; I've yet to get my head around how to create good MapReduce functions around these kinds of relationships, particularly if you need to have metadata in the joining relationship. I'm not sure, but I don't think CouchDB Map functions can call their own queries on the database, since that could potentially cause infinite loops.

Trepidation answered 22/1, 2009 at 3:13 Comment(1)
Excellent point. Document (and other schemaless) datastores are great for rapid early stage development. However, for the same reasons they are great for early stage prototyping, they are problematic for robust production applications.Ruphina
S
9

Use a document-based database when you do not need to store data in tables with uniform sized fields for each record. Instead, you have a need to store each record as a document that has certain characteristics. Any number of fields of any length can be dynamically added to a document at any time without the need to "modify the table" first. Fields in document-based can also contain multiple pieces of data.

Stilla answered 27/1, 2010 at 2:49 Comment(0)
C
4

It depends.

Yes, it is a use case thing. Yes, it is also a developer experience thing. Yes, the nature of the data to be input matters (highly predictable, orthogonal, rational, and easy to normalize, OR unlikely to be normalized/organized in any meaningful way). Yes, the relationships (if any) of one record/object to another matter. Yes, how you need to analyze the data matters. Yes, the nature of the application being supported matters (how the data is to be used in the application).

Yes, it matters if the structure (schema) of a record/document must change rapidly, or if fields themselves must not be mandatory for each record/document

Yes, it matters if you have an extremely large amount of data to store and you want to reduce retrieval times. Normalized data (data in lots of separate, distinct tables) tends to require being put together (joins, subqueries, etc ...) in certain ways to return useful results. Those same results might be returned faster by just returning a few documents or collections (with some filtering).

Oh, yeah, and to make the new world order feel acknowledged ... yes, those who learned JavaScript or Python as their first programming languages are happy to not be burdened with SQL. For example, MongoDB stores data as BSON, which effectively seems like JSON to someone who only cares about getting the data they want--no schemas, just store/get the data and move on to the next thing.

Frankly, it matters which one you learned first. If you learned SQL first, then there is a place for everything and everything in its place. You do not mind defining/altering a schema because it makes know your data very well. In fact, some people prefer SQL because the enjoy the feeling of control. The do not mind knowing another domain specific language because of the power it gives to the user. Since SQL has been around since the 70s, it is basically the old school business way of doing things.

The costs of using a SQL RDBMS are time to plan and modify schemas (partitioning when necessary), time to plan table sizes and scalability (clustering), learning to interfacing with the database and translating records into programming language data structures (ORM, or other).

However, SQL it is very effective when it comes to analyzing data and asking complex questions. If you have more than simple storage and retrieval needs (with minor analytics), then SQL puts you way ahead of the game.

However, a normalized, SQL database as a monolith for an application is not necessarily great for all the data requirements of an application. There may be aspects of an application (web, or otherwise) that are not CENTRAL AND CORE to the going concern of the business.

If you want a tried and true ACID compliant transactional (with rollback) record system for your financial records (payment, purchase, etc ...)---like if you are a bank--then I am going with SQL no matter how good document databases get. However, some do-hicky widget in the UI might not even touch customer records / business transactions. Why have a schema just for that?

Effectively, that is the perspective of the core UI web developer set. They can justify document databases to make their development life simple, but not to make your business transactions ACID compliant. The more experience they gain, the more they will come to see that the convenience of document databases is just that--a convenience.

I am sure that even as I type this, someone is saying that XX document DB now has ACID compliant transactions, but does it have SQL? Eventually, those who want document DBs for everything will find a way to make it happen--it'll probably mean (among other things) that the collections and documents will have more constraints, and it begins to turn into a--GASP--form of a schema.

Look, with things like REST and GraphQL APIs, you never know where you might be getting data from. You cannot predict or plan the form of all data ahead of time. In cases like these (say, interfacing with the Amazon Web Services APIs), then a document database makes good sense. You do not want to normalize that much data. You just want to access, filter it, and do basic stuff to satisfy the needs of your application. Dumping this data into an SQL database could be a waste of time. Every time AWS updates a service with new data, you might have to change your code and schema to accommodate it. ACKKK! Just store it in collections and documents already!

The AWS API example above involves no transactions. There's no need for a bunch of tables if you need to retain some of the API information. Unfortunately, SOME PEOPLE try to make every scenario fit this use case and they would be WRONG!

Going further, given the amount of data one might ingest from the AWS API, sharding and clustering data stored in collections and documents is MUCH simpler, compared to partitioning and clustering SQL databases. If you work in operations, then document databases are easier to administer, ultimately.

So, while I like lots of answers here, many seem to put up a defense of their camp and/or only slightly explain scenarios where document databases might be more appropriate than schema based, orthogonal, SQL databases.

Rules of thumb:

  1. If it is CORE and CRITICAL to your business operations and going concern (CRUD, ACID, transactions), go SQL.
  2. If it is just for handling massive amounts of data for processing in applications and UI, document / NoSQL databases.
Cuspidate answered 29/10, 2022 at 17:14 Comment(2)
I would add that you also need to look at limits that get imposed. I've hit the max of reads/second on s3, the max record size for a DynamoDB local secondary index, etc. Always check your limits before picking something-- it may not sway you, but it's worth checking out. Then compare that with your worst case scenario. I do think it's funny that you pointed out that a lot of people end up with schemas in a nosql db-- I've seen the same. They are great for high volume throughput but schema changes can cause issues between apps, as well, if they aren't updated together.Haulm
Scaling out also seems to be an advantage of something like MongoDB.Cuspidate
F
2

To elaborate on smdelfin: flexibility. You can store data in any structure (being unstructured and all) and every document could be completely different. CouchDB specifically is useful because with their "view" indexes, you can filter out specific documents and query just that view when you want those subsets of your database.

My biggest winning point of document databases that store data in JSON format: this is the native format for JavaScript. Therefore, JavaScript web applications work incredibly-well with CouchDB. I recently made a web app that utilizes CouchDB and it's rocket fast while also able to handle a constantly-varying data structure.

Fattish answered 31/10, 2019 at 18:55 Comment(0)
U
2

Document based databases have a big advantage over relational databases as they do not require defining a schema upfront- before being able to enter any data.

Also, you should use a document database if your data is not relational and cannot be stored in a table but rather is a set of images, or for example newspaper articles.

Another advantage is the easiness to use document-based databases in web development. For more in-depth NoSQL database models comparison check this source: https://arxiv.org/ftp/arxiv/papers/1509/1509.08035.pdf

Unstrained answered 3/5, 2020 at 16:22 Comment(0)
A
1

One reason is to provide fast full-text search on JSON (or other self-describing format) documents that do not necessarily have the same structure/schema.

Arcane answered 7/4, 2022 at 21:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.