Is it good practice to give each CouchDB user a separate database?

Asked 16/2, 2015 at 9:47 Answered 18/2, 2015 at 9:51

Solved javascript database couchdb couchdb-futon

I have a bit of conceptual question regarding the structure of users and their documents.

Is it a good practice to give each user within CouchDB their own database which hold their document?

I have read that couchDB can handle thousands of Databases and that It is not that uncommon for each user to have their database.

Reason:

The reason for asking this question is that I am trying to create a system where a logged in user can only view their own document and can't view any other users document.

Any suggestions.

Thank you in advance.

Val answered 16/2, 2015 at 9:47 Comment(0)

It’s rather common scenario to create CouchDB bucket (DB) for each user. Although there are some drawbacks:

You must keep ddocs in sync in each user bucket, so deployment of ddoc changes across multiple buckets may become a real adventure.
If docs are shared between users in some way, you get doc and viewindex dupes in each bucket.
You must block _info requests to avoid user list leak (or you must name buckets using hashes).
In any case, you need some proxy in front of Couch to create and prepare a new bucket on user registration.
You better protect Couch from running out of capacity when it receives to many requests – it also requires proxy.

Per-doc read ACL can be implemented using _list functions, but this approach has some drawbacks and it also requires a proxy, at least a web-server, in front of CouchDB. See CouchDb read authentication using lists for more details.

Also you can try to play with CoverCouch which implements a full per-doc read ACL, keeping original CouchDB API untouched, but it’s in very early beta.

Renarenado answered 17/2, 2015 at 0:23 Comment(2)

The cons of this approach outweigh the pros: you will end up in completely loosing control over your website and data.. Even knowing how many users have been recently registered, can be a nightmare, if not impossible, with this approach!!! – Dilan 18/2, 2015 at 12:44

@mlorini: Which specific approach are you talking about? This answer touches on several options. – Midst 30/8, 2015 at 16:51

This is quite a common use case, especially in mobile environments, where the data for each user is synchronized to the device using one of the Android, iOS or JavaScript (pouchdb) libraries.

So in concept, this is fine but I would still recommend testing thoroughly before going into production.

Note that one downside of multiple databases is that you can't write queries that span multiple database. There are some workarounds though - for more information see Cloudant: Searching across databases.

Update 17 March 2017:

Please take a look at Cloudant Envoy for more information on this approach.

Database-per-user is a common pattern with CouchDB when there is a requirement for each application user to have their own set of documents which can be synced (e.g. to a mobile device or browser). On the surface, this is a good solution - Cloudant handles a large number of databases within a single installation very well. However ...

Source: https://github.com/cloudant-labs/envoy

Mutiny answered 16/2, 2015 at 11:44 Comment(1)

Thank you so much for the reply. Is there another way to prevent users from viewing other documents within the database. Can this be achieved by sorting the data through a "view" map/reduce function within the design document of the database? – Val 16/2, 2015 at 14:41

The solution is as old as web applications - if you think of a mySQL database there is nothing in the database to stop user B viewing records belonging to user A - it is all coded in the application layer.

In CouchDB there is likewise no completely secure way to prevent user B from accessing documents written by user A. You would need to code this in your application layer just as before.

Provided you have a web application between CouchDB and the users you have no problem. The issue comes when you allow CouchDB to serve requests directly.

Boabdil answered 18/2, 2015 at 2:56 Comment(0)

Using multiple database for multiple users have some important drawbacks:

queries over data in different databases are not possible with the native couchdb API. Analysis on your website overall status are quite impossible!
maintenance will soon becomes very hard: let's think of replicating/compacting thousands of database each time you want to perform a backup

It depends on your use case, but I think that a nice approach can be:

allow access only through virtual host. This can be achieved using a proxy or much more simply by using a couchdb hosting provider which lets you fine-tune your "domains->path" mapping
use design docs / couchapps, instead of direct document CRUD API, for read/write operations

2.1. using _rewrite handler to allow only valid requests: in this way you can instantly block access to sensible handlers like _all_docs, _all_dbs and others

2.2. using _list and _view handlers for read doc/role based ACLs as described in CouchDb read authentication using list

2.3. using _update handlers for write doc/role based ACLs

2.4. using authenticated rewriting rules for read/write role based ACL.

2.3. filtered _changes handler is another way of retrieving all user's data with read doc/role based ACL. Depending on your use case this can effectively simplify as much as possible your read API, letting you concentrate on your update API.

Dilan answered 18/2, 2015 at 9:51 Comment(11)

This approach isn’t nice, cause a) list transformations are slow by their nature, b) it’s impractial to implement acces control for attachments in this way – _list that parses base-64 JSON attachments on every request, is CPU and RAM hog. – Renarenado 18/2, 2015 at 11:43

You can use a dedicated database for static content(attachments) and unguessable IDs to implement security.. After all this is the same approach most websites (e.g. facebook) use to implement static assets privacy: delivering from their content delivery networks images with unguessable ids – Dilan 18/2, 2015 at 12:28

a) slow respect to what? The classic approach based on php+mysql+acl+json encoding can't be for sure faster than list+view – Dilan 18/2, 2015 at 12:30

It’s sure faster then view+list – because we only perform json serialization once. Let me explain how view+list works and how it differs from direct view request. View returns data in chunks – if you have 10 rows in reply, you receive 12 (or more) chunks. Note, you receive first chunk very quickly. View+list first collects entire serialized view response, then again parses it, then exec list, and then again serialaize. Surely, it is inevitably slow. – Renarenado 18/2, 2015 at 12:56

About security using unguessable IDs – please, read this paper papers.ssrn.com/sol3/papers.cfm?abstract_id=842228. Security by obscurity is an illusion. – Renarenado 18/2, 2015 at 13:6

Of course a database per user is faster than using a view+list, but you are totally losing control over your data. So you can compare the view+list approach only with similar approaches... and compared to the classic old style SQL approach, list+view is much faster and efficient – Dilan 18/2, 2015 at 13:6

Besides the theory.. today most of the biggest websites(e.g. google, twitter, facebook, just to name some) are using unguessable ids and CDNs to provide privacy and fast static assets delivery – Dilan 18/2, 2015 at 13:13

It‘s neither fast, nor efficient – I have several systems that use this approach and have their 2 years long stats. Using this architecture you build server, that spend most of RAM and CPU to resolve ACL. It’s reasonable for small systems – say, dozens of users – because even minimal VPS is more than enough. But this approach become too wasteful even for systems of medium capacity. – Renarenado 18/2, 2015 at 13:20

I don't agree. It's only a matter of structuring your data well, for your own use cases and create views to gain max performance out of your docs. Remember that views are incrementally stored and they are very fast compared to traditional mysql views – Dilan 18/2, 2015 at 13:51

Dear @mlorini, certainly views are fast. But chains like view+list are generally much, much slower. No matter agree you or not, because this fact can easily be tested and measured. So if you want efficient per-doc read ACL for medium and large systems, implement it outside CouchDB. And please, let’s stop here. – Renarenado 18/2, 2015 at 14:49

Ok, let's stop here. It can be an endless conversation – Dilan 18/2, 2015 at 15:18

Recommended topics

Hot tags