Multi tenancy in Elastic Search
Asked Answered
T

3

41

We are planning to introduce Elastic search(AWS) for our Multi tenancy application. We have below options,

  1. Using One Index Per Tenant
  2. Using One Type Per Tenant
  3. All Tenants Share One Index with Custom routing

As per this blog https://www.elastic.co/blog/found-multi-tenancy the first option would give memory issue. But not clear about other options.

It seems if we are using the third option then there is no data segregation. Not sure about security.

I believe second option would be better option as data would be segregated.

Help me to identify best option to proceed elastic search with Multi tenancy.

Please note that we would leverage AWS infrastructure.

Thoron answered 26/1, 2017 at 6:40 Comment(10)
What is a tenant in your context?Josejosee
Each client is considered as a Tenant.Thoron
Then the answer depends on how many tenants/clients we are talking (1-10, 10-100, 100-1000, ?) and the growth factor you're expecting, i.e. is the number of client stable or do you expect a x% increase within the next N months? When deciding which strategy to take, you need to think of tomorrow, not today.Josejosee
There is a 4th option that you haven't mentioned: All tenants share one time-based index with custom routing. That's the most flexible option when your client count will increase over timeJosejosee
Is there any difference between third option and fourth option you are mentioning? Assume 10-1000 clientsThoron
Yes, because you can control the size on your indices. If you have a single index, then you'll have to live with it for the eternity and it will have to store everything for all your new clients. Whereas if you decide to have one index per month/year/you-name-it then you can ensure that your indices will not grow beyond an unmanageable limitJosejosee
Also I have one more problem that each client would have different custom fields and field types also different, So Im still thinking either TYPE per client or INDEX per clientThoron
If fields with the same names can have different types depending on clients, then yes you'd need to store those clients in different indices since two types in the same index cannot have fields with the same name and different types...Josejosee
hello @SelvakumarPonnusamy, I wanna know what approach you chose and we are also having questions, searching for past experience. I would appreciate if you can share your experience. Thanks.Scab
I wonder if the memory issue is still relevant since this question and answer is 5 years old and I've read that in version 8.x of Elastic the memory overhead per shard has been significantly reducedReiff
A
38

We are considering the same question right now, and the following set of articles by Elasticsearch was very helpful.

Start here: https://www.elastic.co/guide/en/elasticsearch/guide/current/scale.html

And read through each subsequent article until you hit this one: https://www.elastic.co/guide/en/elasticsearch/guide/current/finite-scale.html

The following two were very eye-opening for me:

https://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html https://www.elastic.co/guide/en/elasticsearch/guide/current/one-big-user.html

The basic takeaway:

  • Alias per customer
  • Shard routing
  • Now you can have indexes for big customers, shared indexes for little customers, and they all appear to be separate indexes
Augean answered 6/3, 2017 at 20:43 Comment(3)
Any way to manage automtically which customer has dedicated index by size?Creight
You can take a look at curator. I'm not sure about the specific use case, but I've used it in the past to do several maintenance-type tasks. Also, the Elasticsearch API is pretty sophisticated. That said, the process of moving a customer from a shared index to a dedicated index with zero downtime is time-consuming - I'm not certain that I'd jump into having it be automated (unless I'm misunderstanding what you mean).Augean
Indeed there is no one-size fits all in the real world. We found a similar advice here: pulse.support/blog/…Korman
C
12

This is a too important link not to be mentioned here: http://www.bigeng.io/elasticsearch-scaling-multitenant/

Good architecture dilemmas, and great performance analysis / reasoning.

tldr; they had index groups that are built around shard allocation filtering to segregate load across nodes in the cluster

Countermeasure answered 25/10, 2017 at 2:13 Comment(1)
HTTP 500 today.Quartic
A
2

To summarize all answers and articles,

  1. Use shared index using custom routing using alias

    1.1) Special case: Big client can have dedicated index, only if needed.

Reference:

Use cases => https://www.elastic.co/blog/found-multi-tenancy

How to do => https://www.elastic.co/guide/en/elasticsearch/guide/current/faking-it.html

Altocumulus answered 25/7, 2022 at 9:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.