Row/Column Based Multi-Tenancy in SQLAlchemy using a Shared Schema

Asked 1/2, 2021 at 12:50 Answered 21/2, 2021 at 14:16

sqlalchemy flask-sqlalchemy multi-tenant

I'd like to build a multi-tenancy application using Flask + SQLAlchemy. The official SQLAlchemy docs suggest that, to use multi-tenancy the tables should be distributed over 1 scheme per tenant and handle the different tenants on engine level.

Maintaining multiple schemes seems a bit bloated to me and I'd like to know if the following approach using the same tables for all tenants is feasible if designed correctly and if not, why not:

Those tables with tenant-owned records have a non-nullable column tenant_id that indicates which tenant "owns" the row.
Create/Update statements (INSERT and UPDATE) automatically set the value of this column to the current tenant.
Reading/Deleting queries (SELECT or DELETE) queries automatically add a WHERE tenant_id=:current tenant clause/filter to the SQL.
The value of tenant_id could be derived from a JSON Web Token.
This automatically set value/where clause could be managed in the background, e.g. by a custom session like in this answer or using a pre-commit hook.

I barely find any information on this approach (excecpt for this one package called MultiAlchemy which seems to work similar to my description but has been archived and not updated for 7 years). My gut feeling says, there's a reason for it.

tl;dr: Why to not use shared schemes for multi-tenancy in SQLAlchemy?

Wailoo answered 1/2, 2021 at 12:50 Comment(0)

As you might have already found (including the other answers to the question), there are multiple approaches to Multi-Tenancy implementation with their pros and cons without a clear winner. What is Multi-Tenant Architecture is just another, but a very good overview of several options, which I will not copy from, as reading the whole article makes sense.

With all pros and cons of all approaches, it comes down to the personal preference and experience of the team.

Personally I would very much like to have an extension to sqlalchemy implementing the A Single, Shared Database Schema multi-tenancy, which MultiAlchemy. It is not clear why the project was archived, but I would be interested to create a new one to support this feature based on:

SQLAlchemy versions 1.4, 2.0 only in order not bother supporting soon to be legacy versions
ideas from MultiAlchemy
ideas from FilteredQuery recipe

Nitrite answered 21/2, 2021 at 14:16 Comment(1)

@konstantin: if you decide to implement this (and have a real test case for it), i might be happy to support in developing such extension, as it will be of interest for me in one of my next projects – Nitrite 21/2, 2021 at 14:17

If you're using multiple schemas to separate the data of different tenants you're basically improving:

Security because separate schemas allow for having separate database users for each tenant. If the access to data of a different tenant is already impossible on database level you don't have to rely on the software using the database to only access data that the tenant should have access to. This significantly reduces the possibility for bugs where the programmer could forget about the WHERE clause/SQLAlchemy filter that restricts the access to the current tenants data.
Scalability because you can easily distribute already separated schemas over mulitple servers but not single tables. So if you're using separate schemas for separate tenants you're already preparing for future growth.

Both aspects are not given if you're modeling multiple tenants and data ownership using only a dedicated foreign key column.

Wailoo answered 20/2, 2021 at 15:22 Comment(0)

Those are two different approaches for multi-tenancy:

1-scheme-per-tenant will look like public.companies, company_a.users, company_a.posts, company_b.users, company_b.posts, ... and inside these tables there will be no references to actual tenant (company)
- it works really good as a distribution of data between multiple databases
- it works better when adding/removing tenant is a very rare event, because it is a bit complicated process (create new schema etc)
- it requires extra efforts when it comes to schema/databases management
- cross-tenant requests are difficult to implement
shared schema will look like a fixed set of tables public.companies, public.users, public.posts and tables will contain a reference to actual tenant (company_id for example)
- adding/removing a tenant is a very simple process
- single schema requires less administration
- cross-tenant requests are simple
- this approach is difficult to scale or distribute between databases

I'd say it depends mostly on the number of tenants and data size. A few tenants, but with lots of data, then 1st approach for sure is better. Dynamic number of tenants, then 2nd approach is better.

I haven't seen before a general implementation of 2nd approach, like MultiAlchemy, I personally would give it a try.

Paint answered 12/2, 2021 at 17:18 Comment(0)

Recommended topics

Hot tags