How to implement ACL on an ElasticSearch-based system?
Asked Answered
M

2

7

I have a system (RESTful) using NodeJS and Elasticsearch which implements RBAC authorization policy. The RBAC authorization works with an authorization server in front of other APIs testing each request against the routes authorized to user's roles (using bearer token to authenticate the user).

I like this design because other API's doesn't need to know about the authorization/authentication service. And it's very, very, very fast, because it uses an in-memory cache policy instead making request to Elasticsearch every time that receives a new request to test the auth.

But now I need to implement ACL to provide more granular control of authorization. From REST point of view the policy will be applied at resources level. Example: "POST:/user/123" is authorized only to A user.

I've done a survey with the clients and 85% will only use allow policy of ACL's, by default the ACL control will deny everything. So ok, now I have all information to develop this control. But I don't see the best way to implement this.

My first thought was:

  1. The most important quality of system is to be scalable;

  2. Okay, it's impossible to do in memory cache, I've done some simulations with 100k users and 1 million of resources (which can be a real scenario) and the amount of memory is HUGE, this feature will have a high cost if cached;

  3. In this case the authentication service can't handle ACL because it can't filter searchs. The auth service doesn't intercept results, only validate headers and routes against roles;

  4. So, with all this points, what if in each document at Elasticsearch I had a new field named "acl_allow_method_user" which is an array of method + user's ID's authorized to use this resource? Will end up with something like this:

"acl_allow_method_user":["POST:123434"]

I'll also have to create a common package to be used by all API's to validate this policy on each interaction with Elasticsearch, but I don't see any problem with this.

  1. Anyone with experience on ACL, is this a good design?

  2. Elasticsearch have limit to size of array fields?

  3. What about performance? Will have impact with this approach?

Montcalm answered 27/5, 2018 at 10:45 Comment(3)
do you really need per document access control?Intercommunion
Unfortunately, yes, it was a request from a relevant customer.Stableboy
Have you given XACML any thought?Memphis
R
4

I would suggest having a separate Elasticsearch index for the ACLs, which should be much smaller than your main document index. This will allow you to tune the ACL index settings appropriately, e.g. (1) with a number of shards lower than your main document index, (2) auto_expand_replicas set to 0-all in case you'd like to use terms query (example: load all documents owned by a user), and (3) enforce different retention/GDPR policies.

The ACL index can then contain a document for each ACL rule, e.g. userId=1,docId=123,opType=POST. Note that this approach will allow you to define ACL rules for other types of principals and resources in the future. Moreover, this can support ACLs that can match new documents dynamically, e.g. userId=1,opType=POST,pattern="*" will allow user with userId=1 to post any document, effectively being a sysadmin. Decoupling ACLs from the documents/users will allow you to update ACLs without having to update corresponding documents, which will perform better in Elasticsearch which doesn't do an in-place update and instead deletes and re-creates the document. Moreover, you'd be able to replace (PUT) the entire document without worrying about preserving the associated ACLs. However, you may want to clean up ACLs when documents or users are deleted, which can be done during the deletion or as a separate scheduled cleanup process.

Now that the ACLs are separate from the documents themselves, they can be cached in memcached or Redis cluster without requiring too much memory. In a typical OLTP system only a small subset of users is active at any point in time, so you can configure your LRU cache appropriately to increase the hit rate. It's hard to provide further recommendations without knowing what kind of access patterns are characteristic of your system.

One last point to consider is what generates the ACLs. If some ACLs are generated automatically, e.g. based on some pattern, then maybe you could use this pattern in your system to avoid having an ACL rule per user per document. For example, if some ACLs are generated from directory service, then you might be able to cache (and periodically refresh) LDAP rules in your ACL management system.

Rating answered 27/5, 2018 at 15:31 Comment(6)
Thanks for the answer! This really makes sense and I can use the auth service to manage this by using the cache already implemented for RBAC. We are using a process cache to cache the RBAC policies, and it works very well. But there is one more problem, one big problem. What about surveys? Searches must be filtered using ACLs. In this way, I do not see how this can be done.Stableboy
Then you need to perform your search and then filter the results according to ACLs in memory.Intercommunion
@Intercommunion sorry, but you must first understand my question... Filter results after search? This is completely wrong. Think about a simple "count" operation. You can't retrieve 1 million of documents and filter these documents against ACLs policies.Stableboy
@VictorFrança ACL per document is very restricting thing, you just don't have much choice there. Also when you store ACLs in the separate index you will not be able to join it easily inside ES to filter the query. The whole concept of ES is about search, don't push you business logic like ACLs there.Intercommunion
@Intercommunion I agree with you, but how to deal with ACLs? We are planning to create the ACL field in each document, it does not seem like a bad idea. In the end, if I create another index just for this, the size will be almost the same as the ACL field in each document. Some documents will be public, that is, they will not have ACL policies and in these cases as would be a giant array of users permission, we will simply let the owner of the document disable the ACL. It looks like an elegant solution, we only have doubts about the size of the array that Elasticsearch supports.Stableboy
@VictorFrança, to avoid filtering in memory on the client you can use terms query which will allow you to restrict the set of documents you are searching based on the ACLs. However, you would have to structure the index that provides terms based on ACLs appropriately, e.g. userId:{opType: docIds}. Then, when you search you could use terms query with "path":"user1/READ", where user1 is the user id who needs to have READ permissions on the doc. You may have to increase index.max_terms_count though.Rating
M
-1

For anyone who is going through the same problem here is the conclusion we draw on the case: being ACLs in microservices REST granular to the point of resources represent challenges similar to a multi-tentant system.

They are business logic and every service knows "how" someone owns a resource (and what are the possible privileges). To standardize how the data on these rules are stored is something that goes against precisely the knowledge of the logic of each service.

The point we can standardize is the endpoints of the ACL's of each microservice (routes that assume the same contract and signature). And if you really want to isolate ACLs in the private environment of the APIs (services), since we have a microservice that is responsible for user control and privileges, the entire architecture can be turned to event-sourcing.

Example without ACL's private API isolation:

  1. We have 3 services: "S (A)" which is responsible for the control of users and privileges, "S (B)" and "S (C)" that do any ordinary task.

  2. The frontend application will have to understand the endpoint of S (A), S (B) and S (C) and make individual requests to control ACL policies of each service.

Example with private API isolation and event-sourcing:

  1. The same microservices are present.

  2. The frontend application makes a request to S (A) applying some ACL policy to S (B) and S (C).

  3. S (A) records the policy change request and triggers an event in a broker notifying the policy change.

  4. S (B) and S (C) capture the event and apply the policies in their logic.

  5. S (B) and S (C) publish results of policy implementation (grant or revocation).

  6. S (A) captures the result event of applying policies and records this result.

I'll choose the answer from @alecswan as the correct one as it was a "starting point" to come to that conclusion.

Thanks also @xeye, which alerted us to the business logic part.

Montcalm answered 1/6, 2018 at 11:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.