What is the recommended approach towards multi-tenant databases in MongoDB?
Asked Answered
F

6

125

I'm thinking of creating a multi-tenant app using MongoDB. I don't have any guesses in terms of how many tenants I'd have yet, but I would like to be able to scale into the thousands.

I can think of three strategies:

  1. All tenants in the same collection, using tenant-specific fields for security
  2. 1 Collection per tenant in a single shared DB
  3. 1 Database per tenant

The voice in my head is suggesting that I go with option 2.

Thoughts and implications, anyone?

Fruity answered 1/5, 2010 at 3:55 Comment(8)
Dear @Braintapper, we are in the same situation right now with our application who needs to be multi-tenancy-able. Do you have any experiences to share? Would be great, thank you.Zadoc
For my app, I ended up going with Postgresql (we get the benefit of a relational database with some NoSQL-like functionality via the hstore extension) instead of MongoDB and handling multi-tenancy in Rails with scoping. We use a similar approach to the one used in this Railscast: railscasts.com/episodes/388-multitenancy-with-scopesFruity
i know an answer has already been picked for this question but anyone else should refer to this official document on mongohq site : support.mongohq.com/use-cases/multi-tenant.html . It clearly advocates against @Fruity solution belowArgon
Answer updated. The information in your link was not readily available in May 2010.Fruity
@Fruity are you using the postgresql solution (based on railscasts.com) right now ? I want to use it but I am not sure if it adds security and how many tenant it can support ! please I need your feedback about this experience. thanksRecluse
@medBo Yes, we are using our own customized version of the Postgresql solution. You can have as many tenants as you want, you just need to make sure your database design is tuned for your requirements. In terms of security, you will have to do some work there. There are some canned gems and libraries that you can find to help manage that.Fruity
@Fruity thank you for reply, I was thinking about postgres schema separation, but it seem that it has its limitation, and maybe it's better to do the "scopes solution", can you give me an idea of what kind of work i have to do within the security issue ?Recluse
@medBo You need to make sure that your controllers and models are doing what they need to pick up the current tenant id to ensure that all queries are filtered against that tenant id. Once again, using a canned gem might likely do it for you. Anything I tell you about how I did it is likely way out of date.Fruity
L
90

I have the same problem to solve and also considering variants. As I have years of experience creating SaaS multi-tenant applicatios I also was going to select the second option based on my previous experience with the relational databases.

While making my research I found this article on mongodb support site (way back added since it's gone): https://web.archive.org/web/20140812091703/http://support.mongohq.com/use-cases/multi-tenant.html

The guys stated to avoid 2nd options at any cost, which as I understand is not particularly specific to mongodb. My impression is that this is applicable for most of the NoSQL dbs I researched (CoachDB, Cassandra, CouchBase Server, etc.) due to the specifics of the database design.

Collections (or buckets or however they call it in different DBs) are not the same thing as security schemas in RDBMS despite they behave as container for documents they are useless for applying good tenant separation. I couldn't find NoSQL database that can apply security restrictions based on collections.

Of course you can use mongodb role based security to restrict the access on database/server level. (http://docs.mongodb.org/manual/core/authorization/)

I would recommend 1st option when:

  • You have enough time and resources to deal with the complexity of the design, implementation and testing of this scenario.
  • If you are not going to have much differences in structure and functionality in the database for different tenants.
  • Your application design will allow tenants to make only minimal customizations at runtime.
  • If you want to optimize space and minimize usage of hardware resources.
  • If you are going to have thousands of tenants.
  • If you want to scale out fast and at good cost.
  • If you are NOT going to backup data based on tenants (keep separate backups for each tenant). It is possible to do that even in this scenario but the effort will be huge.

I would go for variant 3 if:

  • You are going to have small list of tenants (several hundred).
  • The specifics of the business requires you to be able to support big differences in the database structure for different tenants (e.g. integration with 3rd-party systems, import-export of data).
  • Your application design will allow customers (tenants) to make significant changes in the application runtime (adding modules, customizing the fields etc.).
  • If you have enough resources to scale out with new hardware nodes quickly.
  • If you are required to keep versions/backups of data per tenant. Also the restore will be easy.
  • There are legal/regulatory restrictions that forces you to keep different tenants in different databases (even data centers).
  • If you want to fully utilize the out-of-the-box security features of mongodb such as roles.
  • There are big differences in matter of size between tenants (you have many small tenants and few very large tenants).

If you post additional details about your application, perhaps I can give you more detailed advice.

Laxation answered 21/4, 2014 at 17:54 Comment(4)
I guess the original link is dead, went for archived one: web.archive.org/web/20140812091703/http://support.mongohq.com/…Mesmerism
Hello , How can we create new db with current db using mongodb ?Damson
@Russian How we are going to handle indexing if we are going for opting 1Turnstile
The official MongoDB docs now have this covered: mongodb.com/docs/atlas/build-multi-tenant-archAbednego
F
10

I found a good answer in the comments in this link:

http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/

Basically option #2 seems to be the best way to go.

Quote from David Mytton's comment:

We decided not to have a database per customer because of the way MongoDB allocates its data files. Each database uses it’s own set of files:

The first file for a database is dbname.0, then dbname.1, etc. dbname.0 will be 64MB, dbname.1 128MB, etc., up to 2GB. Once the files reach 2GB in size, each successive file is also 2GB.

Thus if the last datafile present is say, 1GB, that file might be 90% empty if it was recently reached.

from the manual.

As users sign up to the trial and give things a go, we’d get more and more databases that were at least 2GB in size, even if the whole of the data file wasn’t use. We found this used a massive amount of disk space compared to having several databases for all customers where the disk space can be used to maximum efficiency.

Sharding will be on a per collection basis as standard which presents a problem where the collection never reaches the minimum size to start sharding, as is the case for quite a few of ours (e.g. collections just storing user login details). However, we have requested that this will also be able to be done on a per database level. See http://jira.mongodb.org/browse/SHARDING-41

There are no performance tradeoffs using lots of collections. See http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections

Fruity answered 1/5, 2010 at 17:7 Comment(2)
As suggested in other answers, #2 is not a good approach. Please consider changing the accepted answer, because this could miss lead other developers.Redistrict
Changed accepted answer, as things have changed significantly since 2010, when the question was first asked.Fruity
A
3

There is a reasonable article on MSDN about multi-tenant data architecture which you might wish to refer to. Some key topics touched on by this article:

  • Economic considerations
  • Security
  • Tenant considerations
  • Regulatory (legal)
  • Skill set concerns

Also touched upon are some patterns for Software as a Service (SaaS) configuration.

Additionally, worth a gander is an interesting write-up from the SQL Anywhere guys.

My own personal take - unless you are certain of enforced security / trust, I would go with option 3, or if scalability concerns prohibit fallback to option 2 at a minimum. That said... I'm no pro with MongoDB. I get pretty nervous using a shared "schema" - but I will happily defer to more experienced practitioners.

Aragats answered 1/5, 2010 at 4:2 Comment(1)
I'm familiar with that MSDN article, as my original plan was to use a relational database. My data is quite unstructured, however, which now has me investigating NoSQL dbs like MongoDB. It doesn't seem that MongoDB have ACL support the way Lotus Domino does, and I don't really want to reinvent the wheel, which makes me also think 2 or 3 are the way to go. I also don't know if there are limits that I may encounter in terms of # of collections or dbs allowed in MongoDB though.Fruity
H
3

I would go for option 2.

However you could set mongod.exe command line option --smallfiles. This means that the biggest file size of an extent will be 0.5 gigabyte and not 2 gigabyte. I tested this with mongo 1.42 . So option 3 is not impossible.

Hillegass answered 2/5, 2010 at 5:47 Comment(1)
Just so it helps, in retrospective: http://yazezo.com/2013/10/how-to-setup-saas-cloud-multi-tenant.htmlAssassinate
M
0

According to my research in MongoDB. Trucos y consejos. Aplicaciones multitenant. that option is not recommended if you do not know how many tenants you can have, it could be thousands and it would be complicated when it comes to sharding, also imagine having thousands of collections in a single database ... So in your case it is recommended to use option one. Now if you are going to have a limited number of users, it is already different and yes, you could use option two as you thought.

Meris answered 11/5, 2018 at 13:30 Comment(0)
P
-4

While the discussion here is on NoSQL and primarily MongoDB, we at Citus are using PostgreSQL and building a distributed/sharded multi-tenant database.

Our use-case guide walks through an example app, covering the schema and various multi-tenant specific features.

For more unstructured data, we use PostgreSQL's JSONB column to store such and tenant-specific data.

Phage answered 1/8, 2017 at 22:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.