What is the recommended approach towards multi-tenant databases in Cassandra?
Asked Answered
C

2

6

I'm thinking of creating a multi-tenant app using Apache Cassandra.

I can think of three strategies:

  1. All tenants in the same keyspace using tenant-specific fields for security
  2. table per tenant in a single shared DB
  3. Keyspace per tenant

The voice in my head is suggesting that I go with option 3.

Thoughts and implications, anyone?

Consummate answered 21/11, 2018 at 7:33 Comment(4)
Not sure why Spring-Data-Cassandra is tagged here, as the question has nothing to do with it. But I'll say that you should really use the DataStax Java Driver. The Spring-Data-Cassandra driver uses large batches and unbound queries to mimic some of the functionality from the relational world. So Spring-Data-Cassandra is a definite no in my book; especially in a multi-tenant cluster.Reassure
Support regarding not using spring-data-cassandra :-)Ambrosio
How many tennants?Activate
there will be 40+ tenants expectingConsummate
A
7

There are several considerations that you need to take into account:

Option 1: In pure Cassandra this option will work only if access to database will be always through "proxy" - the API, for example, that will enforce filtering on tenant field. Otherwise, if you provide an CQL access, then everybody can read all data. In this case, you need also to create data model carefully, to have tenant as a part of composite partition key. DataStax Enterprise (DSE) has additional functionality called row-level access control (RLAC) that allows to set permissions on the table level.

Options 2 & 3: are quite similar, except that when you have a keyspace per tenant, then you have flexibility to setup different replication strategy - this could be useful to store customer's data in different data centers bound to different geographic regions. But in both cases there are limitations on the number of tables in the cluster - reasonable number of tables is around 200, with "hard stop" on more than 500. The reason - you need an additional resources, such as memory, to keep auxiliary data structures (bloom filter, etc.) for every table, and this will consume both heap & off-heap memory.

Ambrosio answered 21/11, 2018 at 8:6 Comment(2)
Thanks for the suggestion. Please share if you have any working examples on the multi-tenancy with Cassandra with Option 3.Consummate
it's just standard Cassandra functionality - you create keyspace and configure data centersAmbrosio
R
6

I've done this for a few years now at large-scale in the retail space. So my belief is that the recommended way to handle multi-tenancy in Cassandra, is not to. No matter how you do it, the tenants will be hit by the "noisy neighbor" problem. Just wait until one tenant runs a BATCH update with 60k writes batched to the same table, and everyone else's performance falls off.

But the bigger problem, is that there's no way you can guarantee that each tenant will even have a similar ratio of reads to writes. In fact they will likely be quite different. That's going to be a problem for options #1 and #2, as disk IOPs will be going to the same directory.

Option #3 is really the only way it realistically works. But again, all it takes is one ill-considered BATCH write to crush everyone. Also, want to upgrade your cluster? Now you have to coordinate it with multiple teams, instead of just one. Using SSL? Make sure multiple teams get the right certificate, instead of just one.

When we have new teams use Cassandra, each team gets their own cluster. That way, they can't hurt anyone else, and we can support them with fewer question marks about who is doing what.

Reassure answered 21/11, 2018 at 14:6 Comment(4)
Hello - I read your post and I am not sure I understand what you are suggesting. Are you recommending not to use cassandra to store multi-tenant data?Jedidiah
@costa At a high level, yes, I am suggesting that. The idea, is that multi-tenant workloads will compete with each other for resources. So the thought here, is to architect the solution with as much separation between tenants as possible. Option #3 does a good job in achieving that separation, although I've seen the best results with building each tenant its own cluster.Reassure
Interesting. I guess it depends on the number of tenants as well and requirements. In my case I need to deal with end-users. Not sure I would call it multi-tenancy, but it is a form of co-existence.Jedidiah
@costa Ahh, I understand. I"m considering "tenants" as multiple applications. If you have different end users, that should be ok. For larger scale projects, I have heard of some SaaS companies putting each customer in their own keyspace, but for smaller scale, mixing user activity shouldn't be too much of an issue.Reassure

© 2022 - 2024 — McMap. All rights reserved.