How to optimize Row Level Security in Postgresql

Asked 25/7, 2021 at 12:2 Answered 13/1, 2023 at 5:56

I have postgres (13.2) based API with RLS enabled (I use postgraphile) and it's extremely slow. User sends JWT from Google OAuth. Access to tables are based on roles (there are 2: person, admin) + RLS. I have 2 tables for users auth: person, person_groups

CREATE TABLE IF NOT EXISTS myschema.person_groups (
    id serial PRIMARY KEY,
    person_id citext NOT NULL REFERENCES myschema.person (id),
    google_id text NOT NULL REFERENCES myschema_private.person_account (google_id),
    group_id serial NOT NULL REFERENCES myschema.groups (id),
    updated_at timestamp DEFAULT now(),
    CONSTRAINT unq_person_id_group_id UNIQUE (person_id, group_id)
);

CREATE INDEX persongroups_google_group_idx ON myschema.person_groups (google_id, group_id);

For RLS to check I have function specified as:

CREATE OR REPLACE FUNCTION myschema.is_in_group (group_id int[])
  RETURNS boolean
  AS $$
  SELECT
    CASE WHEN current_setting('role', FALSE) = 'admin' THEN
      TRUE
    WHEN EXISTS (
      SELECT
        1
      FROM
        myschema.person_groups
      WHERE
        person_groups.group_id = ANY ($1) AND person_groups.google_id = current_setting('user.sub', TRUE)) THEN
      TRUE
    ELSE
      FALSE
    END
$$
LANGUAGE SQL
STABLE
STRICT
SECURITY DEFINER;

I have table: "gate_enterlogs", which user wants to access. RLS for this table is:

CREATE POLICY select_gate_enterlog ON myschema.gate_enterlog
  FOR SELECT TO person
    USING (myschema.is_in_group (ARRAY[6, 1]));

If I use such code:

BEGIN;
SET local ROLE person;
SET local "user.sub" TO 'yyy';
EXPLAIN ANALYZE VERBOSE
SELECT COUNT(id) FROM myschema.gate_enterlog;
COMMIT;

I end up with:

Aggregate  (cost=23369.00..23369.01 rows=1 width=8) (actual time=2897.487..2897.487 rows=1 loops=1)
  Output: count(id)
  ->  Seq Scan on myschema.gate_enterlog  (cost=0.00..23297.08 rows=28769 width=4) (actual time=2897.484..2897.484 rows=0 loops=1)
        Output: id, person_id, checkpoint_time, direction, place
        Filter: is_in_group('{6,1}'::integer[])
        Rows Removed by Filter: 86308
Planning Time: 0.626 ms
Execution Time: 2897.567 ms

If I disable RLS policy:

CREATE POLICY select_gate_enterlog ON myschema.gate_enterlog FOR SELECT TO person USING (TRUE);

Aggregate  (cost=1935.85..1935.86 rows=1 width=8) (actual time=17.671..17.672 rows=1 loops=1)
  Output: count(id)
  ->  Seq Scan on myschema.gate_enterlog  (cost=0.00..1720.08 rows=86308 width=4) (actual time=0.008..7.364 rows=86308 loops=1)
        Output: id, person_id, checkpoint_time, direction, place
Planning Time: 0.594 ms
Execution Time: 17.737 ms

Do you have any thoughts how can I optimize RLS so postgres would "remember" that user has privileges to access table. My only idea is to end up with USING (TRUE) for select and grant access once before calling query, but before going that way I hope that somebody can give me a hint what I did wrong

Juni answered 25/7, 2021 at 12:2 Comment(5)

Hm, it's certainly weird that postgres doesn't realise is_in_group('{6,1}'::integer[]) is a constant expression that evaluates to FALSE, then skips scanning the table… – Telluric 25/7, 2021 at 13:17

I don't know what kind of authentication mechanism you use, but for whole-table permissions (where the policy does not depend on values from the individual roles) normal user/group permissions seem to be more suited than RLS. Can you switch the database role depending on whether the user is part of groups 1 or 6? Then give only that role access (GRANT SELECT …) to the table, and deny it for normal users altogether. – Telluric 25/7, 2021 at 13:22

I'm using this approach for security: graphile.org/postgraphile/security . Basically user fetch Google JWT token, sends it to NodeJS app, app checks JWT validity and if valid sends user_id to postgres which suppose to check is_in_group With this approach admin user can add groups, can add user to groups via api. With role approach as far as I can understand admin can't create 'roles' (as it's not normal table) via api. Anyway, I'm very curious what I'm doing wrong that after first run of is_in_group('{6,1}'::integer[]) value it's not cached against other rows. – Juni 25/7, 2021 at 15:34

I mean postgraphile doesn't only send the user_id to the database, it also gets the role from the JWT and uses that for the session. I don't know how you configure this with Google, but adding a user to a group would then need to be done via that (Google) API that issues the tokens. I don't see how adding new groups is relevant here, since the membership check is hardcoded anyway in your case. But yeah, this approach gets a bit more complicated if you also need dynamic groups – Telluric 25/7, 2021 at 16:5

I had more dynamic approach of checking if user_sub is in group which admin applied, but as I'm trying to narrow down performance issue I created hard-coded group permissions and I'm out of ideas. – Juni 25/7, 2021 at 16:29

I figured it out somehow. It seems that for some reason boolean function aren't optimised. I changed my auth function to:

CREATE OR REPLACE FUNCTION myschema.auth_group (group_id int[])
  RETURNS SETOF int
  AS $$
BEGIN
  IF current_setting('role', FALSE)  = 'admin' THEN
    RETURN QUERY SELECT 1;
  ELSIF EXISTS (SELECT 1 FROM myschema.person_groups
      WHERE person_groups.google_id = current_setting('user.sub', TRUE) AND person_groups.group_id = ANY ($1)) THEN
    RETURN QUERY SELECT 1;
  END IF;
END;
$$
LANGUAGE plpgsql
STABLE STRICT
SECURITY DEFINER;

CREATE POLICY select_gate_enterlog ON myschema.gate_enterlog
  FOR SELECT TO person USING (EXISTS (SELECT myschema.auth_group (ARRAY[6, 1])));

With such function planner is efficient:

Aggregate  (cost=1827.97..1827.98 rows=1 width=8) (actual time=6.005..6.006 rows=1 loops=1)
  Output: count(gate_enterlog.id)
  InitPlan 1 (returns $0)
    ->  ProjectSet  (cost=0.00..5.27 rows=1000 width=4) (actual time=0.158..0.159 rows=0 loops=1)
          Output: auth_group(current_setting('role'::text, false), current_setting('user.sub'::text, true), '{6,1}'::integer[])
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
  ->  Seq Scan on mychema.gate_enterlog  (cost=0.00..1720.08 rows=43154 width=4) (actual time=6.002..6.002 rows=0 loops=1)
        Output: gate_enterlog.id, gate_enterlog.person_id, gate_enterlog.checkpoint_time, gate_enterlog.direction, gate_enterlog.place
        Filter: $0
        Rows Removed by Filter: 86308
Planning Time: 0.500 ms
Execution Time: 6.100 ms

Cost is pretty much the same as USING(TRUE) in RLS.

Juni answered 3/8, 2021 at 9:33 Comment(0)

Accepted answer technically works, but SETOF return is not the actual fix here. Unless the contents of functions need optimization, you can keep the return type as is. The part that actually boost the performance is using SELECT when calling the function, which makes it so that subquery is evaluated and cached once, and not invoked for every row. To give an example, for the following functions used with RLS:

create or replace function utils.my_rls_check() returns boolean
  LANGUAGE plpgsql
  as $$
BEGIN
   return true;
END;
$$;

CREATE OR REPLACE FUNCTION utils.my_other_rls_check() RETURNS UUID AS $$
  SELECT '00000000-0000-0000-0000-000000000000'::UUID;
$$ LANGUAGE sql STABLE;

This would be slow:

select count(*) from app_public.entities 
WHERE utils.my_rls_check();

select count(*) from app_public.entities 
WHERE check_id = utils.my_other_rls_check();

This would be the fix to boost the performance:

select count(*) from app_public.entities 
WHERE (SELECT utils.my_rls_check());

select count(*) from app_public.entities 
WHERE check_id = (SELECT utils.my_other_rls_check());

The example functions are simplistic, but are enough to test the differences in performance, as long as the table being tested has a reasonable amount of data, e.g. 1 million rows or more.

Mahratta answered 13/1, 2023 at 5:56 Comment(1)

Interesting. Do you happen to know if this optimization is documented somewhere? And is this the recommended way to improve performance of RLS rules? (as opposed to, say, using a view that implements equivalents of the RLS checks in a more batchable way) – Halitosis 30/4 at 23:21

Recommended topics

Hot tags