PostgreSQL's schemas for multi-tenant applications
Asked Answered
C

1

60

I'm learning about multi-tenant applications and how PostgreSQL's schemas can be used for this.

Researching the subject, I ended up finding an article in which the author describes a poor experience when using PostgreSQL's schemas in multi-tenant applications. The main problems would be having bad performance for migrations and high usage of database resources.

It seems like having only one schema (sharing the tables among the tenants) would lead to better performance than having one separated schema for each tenant. But it feels strange to me. I would think the opposite, since indexes on smaller tables tend to be lighter than indexes on larger tables.

Why would the performance be worse when having data separated in a lot of small tables (in multiple schemas), than having data separated in a few huge tables (in a single schema)?

Company answered 13/6, 2017 at 14:27 Comment(1)
I think the article says more about Rails developers than it says about PostgreSQL. But in the absence of any code, this will probably be closed.Aleuromancy
C
123

Performance isn't worse, necessarily. As the article explains, there are specific conditions which make the schema approach better or worse depending on your application design and workload. Let me explain the tradeoffs of the "tenant-schema" vs. "shared-table" approaches:

tenant-schema is best when you have a relatively small number of fairly large tenants. An example of this would be an accounting application, with only paid subscription users. Things which make it the better performing option for you include:

  • a small number of tenants with a lot of data each
  • a relatively simple schema without a lot of tables per tenant
  • a need to customize the schemas of some tenants
  • ability to make use of database roles per tenant
  • requirement to migrate a tenant's data from one server to another
  • ability to spin up a dedicated appserver in your cloud for each tenant

Things which make it a poor-performing option include:

  • lots of tenants with very little data each
  • stateless approach to connections where each request could be any tenant
  • client library or orm which caches metadata for all tables (like ActiveRecord)
  • a requirement for efficient, high-performance connection pooling and/or caching
  • problems with VACUUM and other PostgreSQL administrative operations which scale poorly across 1000's of tables.

Whether tenant-schema is bad for migrations/schema changes really depends on how you're doing them. It's bad for rolling out a universal schema change quickly, but good for deploying schema changes as a gradual rollout across tenants.

shared-table works better for situations when you have a lot of tenants, and a lot of your tenants have very little data. An example of this would be a social medial mobile application which permits free accounts and thus has thousands of abandoned accounts. Other things which make the shared table model beneficial are:

  • better for connection pooling, as all connections can use the same pool
  • better for PostgreSQL administration, because of fewer tables total
  • better for migrations and schema changes, since there's only one "set" of tables

The main drawback of shared-table is the need to append the tenant filter condition onto every single query in the application layer. It's also problematic because:

  • queries which join many tables may perform poorly because the tenant filter throws off query planning
  • tables which grow to 100millions of rows can cause specific performance and maintenance issues
  • no way to do tenant-specific application changes or schema upgrades
  • more expensive to migrate tenants between servers

So which model "performs better" really depends on which tradeoffs hurt you the worst.

There's also a hybrid model, "tenant-view", where the actual data is stored in shared tables, but each application connection uses security barrier views to view the data. This has some of the tradeoffs of each model. Primarily, it has the security benefits of the tenant-schema model with some of the performance drawbacks of both models.

Chante answered 13/6, 2017 at 20:7 Comment(7)
It's the best information I've ever read about the reasons for using (or not) PostgreSQL schemas in a multi-tenant application. I still don't know why the performance of having lots of small tables (and schemas) is worse than having few huge tables. But, for sure, now I can decide which setting is the ideal one for my situation. Thank you very much!Company
The article you linked to pretty clearly explained that their performance issues were with the application, not queries on individual tables.Chante
I don't agree. He even says that: "My guess is there’s an upper limit — if not a hard one, then at least a soft, recommended one — on the number of tables/indexes etc that you store within one postgres db". The problem seems to be that postgres does not do well with large amount of tables.Company
There's some specific issues with large numbers of tables, namely administrative tasks like backup and VACUUM. However, by "large numbers of tables" we're talking 10s or 100s of thousands, not a few hundred. And those issues don't affect SELECT queries. If you read the blog post, the main issue with having many tables was on the ActiveRecord side (just FYI, I've been doing PostgreSQL performance for 19 years).Chante
When a PostgreSQL geek talks about "many tables", they mean a lot. For more, see the slides/video from The Billion Tables Project: pgcon.org/2013/schedule/events/595.en.htmlChante
true, it'd be nice if @Chante mentioned a ballpark for each "number of" parts. Number of tenants, number of tables etc.Alita
KBN: there's no ballpark, because (a) it's relative to your system resources, and (b) it's combinatorics of multiple interdependant factors. For example, if you're running your database on a 2-core cloud vserver, what's "a lot" is going to be much, much smaller than if you're running it on an IBM zSeries.Chante

© 2022 - 2024 — McMap. All rights reserved.