Why should I use document based database like CouchDB instead of using relational database. Are there any typical kinds of applications or domains where the document based database is more suitable than the relational database?
Probably you shouldn't :-)
The second most obvious answer is you should use it if your data isn't relational. This usually manifests itself in having no easy way to describe your data as a set of columns. A good example is a database where you actually store paper documents, e.g. by scanning office mail. The data is the scanned PDF and you have some meta data which always exists (scanned at, scanned by, type of document) and lots of possible metadata fields which exists sometime (customer number, supplier number, order number, keep on file until, OCRed fulltext, etc). Usually you do not know in advance which metadata fields you will add within the next two years. Things like CouchDB work much nicer for that kind of data than relational databases.
I also personally love the fact that I don't need any client libraries for CouchDB except an HTTP client, which is nowadays included in nearly every programming language.
The probably least obvious answer: If you feel no pain using a RDBMS, stay with it. If you always have to work around your RDBMS to get your job done, a document oriented database might be worth a look.
For a more elaborate list check this posting of Richard Jones.
From CouchDB documentation (https://web.archive.org/web/20090122111651/http://couchdb.apache.org/docs/overview.html):
"A document database server, accessible via a RESTful JSON API." Generally, relational databases aren't simply accessed via REST services, but require a much more complex SQL API. Often these API's (JDBC, ODBC, etc.) are quite complex. REST is quite simple.
Ad-hoc and schema-free with a flat address space. Relational databases have complex, fixed schema. You define tables, columns, indexes, sequences, views and other stuff. Couch doesn't require this level of complex, expensive, fragile advanced planning.
Distributed, featuring robust, incremental replication with bi-directional conflict detection and management. Some SQL commercial products offer this. Because of the SQL API and the fixed schemas, this is complex, difficult and expensive. For Couch, it appears simple and inexpensive.
Query-able and index-able, featuring a table oriented reporting engine that uses Javascript as a query language. So does SQL and relational databases. Nothing new here.
So. Why CouchDB?
- REST is simpler than JDBC or ODBC.
- No Schema is simpler than Schema.
- Distributed in a way that appears simple and inexpensive.
For stupidly storing and serving other-servers-data.
In the last couple of weeks I've been playing with a lifestream app that polls my feeds (delicious, flickr, github, twitter...) and stores them in couchdb. The beauty of couchdb is that it lets me keep the original data in its original structure with no overhead. I added a 'class' field to each document, storing the source server, and wrote a javascript render class for each source.
Generalizing, whenever your server communicates with another server a schema-less storage is best as you have no control over the schema. As a bonus, couchdb uses the native protocols of servers and clients - JSON for representation and HTTP REST for transport.
Rapid application development comes to mind.
When I am constantly evolving my schema, I am constantly frustrated by having to maintain the schema in MySQL/SQLite. While I've not done too much with CouchDB yet, I do like how simple it is to evolve the schema during the RAD process.
A case where you might not want to use a non-relational database is when you have a lot of many-to-many relationships; I've yet to get my head around how to create good MapReduce functions around these kinds of relationships, particularly if you need to have metadata in the joining relationship. I'm not sure, but I don't think CouchDB Map functions can call their own queries on the database, since that could potentially cause infinite loops.
Use a document-based database when you do not need to store data in tables with uniform sized fields for each record. Instead, you have a need to store each record as a document that has certain characteristics. Any number of fields of any length can be dynamically added to a document at any time without the need to "modify the table" first. Fields in document-based can also contain multiple pieces of data.
It depends.
Yes, it is a use case thing. Yes, it is also a developer experience thing. Yes, the nature of the data to be input matters (highly predictable, orthogonal, rational, and easy to normalize, OR unlikely to be normalized/organized in any meaningful way). Yes, the relationships (if any) of one record/object to another matter. Yes, how you need to analyze the data matters. Yes, the nature of the application being supported matters (how the data is to be used in the application).
Yes, it matters if the structure (schema) of a record/document must change rapidly, or if fields themselves must not be mandatory for each record/document
Yes, it matters if you have an extremely large amount of data to store and you want to reduce retrieval times. Normalized data (data in lots of separate, distinct tables) tends to require being put together (joins, subqueries, etc ...) in certain ways to return useful results. Those same results might be returned faster by just returning a few documents or collections (with some filtering).
Oh, yeah, and to make the new world order feel acknowledged ... yes, those who learned JavaScript or Python as their first programming languages are happy to not be burdened with SQL. For example, MongoDB stores data as BSON, which effectively seems like JSON to someone who only cares about getting the data they want--no schemas, just store/get the data and move on to the next thing.
Frankly, it matters which one you learned first. If you learned SQL first, then there is a place for everything and everything in its place. You do not mind defining/altering a schema because it makes know your data very well. In fact, some people prefer SQL because the enjoy the feeling of control. The do not mind knowing another domain specific language because of the power it gives to the user. Since SQL has been around since the 70s, it is basically the old school business way of doing things.
The costs of using a SQL RDBMS are time to plan and modify schemas (partitioning when necessary), time to plan table sizes and scalability (clustering), learning to interfacing with the database and translating records into programming language data structures (ORM, or other).
However, SQL it is very effective when it comes to analyzing data and asking complex questions. If you have more than simple storage and retrieval needs (with minor analytics), then SQL puts you way ahead of the game.
However, a normalized, SQL database as a monolith for an application is not necessarily great for all the data requirements of an application. There may be aspects of an application (web, or otherwise) that are not CENTRAL AND CORE to the going concern of the business.
If you want a tried and true ACID compliant transactional (with rollback) record system for your financial records (payment, purchase, etc ...)---like if you are a bank--then I am going with SQL no matter how good document databases get. However, some do-hicky widget in the UI might not even touch customer records / business transactions. Why have a schema just for that?
Effectively, that is the perspective of the core UI web developer set. They can justify document databases to make their development life simple, but not to make your business transactions ACID compliant. The more experience they gain, the more they will come to see that the convenience of document databases is just that--a convenience.
I am sure that even as I type this, someone is saying that XX document DB now has ACID compliant transactions, but does it have SQL? Eventually, those who want document DBs for everything will find a way to make it happen--it'll probably mean (among other things) that the collections and documents will have more constraints, and it begins to turn into a--GASP--form of a schema.
Look, with things like REST and GraphQL APIs, you never know where you might be getting data from. You cannot predict or plan the form of all data ahead of time. In cases like these (say, interfacing with the Amazon Web Services APIs), then a document database makes good sense. You do not want to normalize that much data. You just want to access, filter it, and do basic stuff to satisfy the needs of your application. Dumping this data into an SQL database could be a waste of time. Every time AWS updates a service with new data, you might have to change your code and schema to accommodate it. ACKKK! Just store it in collections and documents already!
The AWS API example above involves no transactions. There's no need for a bunch of tables if you need to retain some of the API information. Unfortunately, SOME PEOPLE try to make every scenario fit this use case and they would be WRONG!
Going further, given the amount of data one might ingest from the AWS API, sharding and clustering data stored in collections and documents is MUCH simpler, compared to partitioning and clustering SQL databases. If you work in operations, then document databases are easier to administer, ultimately.
So, while I like lots of answers here, many seem to put up a defense of their camp and/or only slightly explain scenarios where document databases might be more appropriate than schema based, orthogonal, SQL databases.
Rules of thumb:
- If it is CORE and CRITICAL to your business operations and going concern (CRUD, ACID, transactions), go SQL.
- If it is just for handling massive amounts of data for processing in applications and UI, document / NoSQL databases.
To elaborate on smdelfin: flexibility. You can store data in any structure (being unstructured and all) and every document could be completely different. CouchDB specifically is useful because with their "view" indexes, you can filter out specific documents and query just that view when you want those subsets of your database.
My biggest winning point of document databases that store data in JSON format: this is the native format for JavaScript. Therefore, JavaScript web applications work incredibly-well with CouchDB. I recently made a web app that utilizes CouchDB and it's rocket fast while also able to handle a constantly-varying data structure.
Document based databases have a big advantage over relational databases as they do not require defining a schema upfront- before being able to enter any data.
Also, you should use a document database if your data is not relational and cannot be stored in a table but rather is a set of images, or for example newspaper articles.
Another advantage is the easiness to use document-based databases in web development. For more in-depth NoSQL database models comparison check this source: https://arxiv.org/ftp/arxiv/papers/1509/1509.08035.pdf
One reason is to provide fast full-text search on JSON (or other self-describing format) documents that do not necessarily have the same structure/schema.
© 2022 - 2024 — McMap. All rights reserved.