Why is Row Level Security (RLS) not using indexes?
Asked Answered
A

6

10

I've got an application with Patients and Therapists. They're all in the same users table. Patients should be able to see their Therapist and Therapists should be able to see their Patients.

I've set up a materialized view (user_access_pairs) with pairs of users IDs, if two users have a row in the view then it means that hey should have access to each other.

database> \d user_access_pairs
+----------+---------+-------------+
| Column   | Type    | Modifiers   |
|----------+---------+-------------|
| id1      | integer |             |
| id2      | integer |             |
+----------+---------+-------------+
Indexes:
    "index_user_access_pairs" UNIQUE, btree (id1, id2)

Here's the definition of the users table, it has a bunch of more columns which shouldn't be relevant for this question.

database> \d users
+-----------------------------+-----------------------------+-----------------------------------------------------+
| Column                      | Type                        | Modifiers                                           |
|-----------------------------+-----------------------------+-----------------------------------------------------|
| id                          | integer                     |  not null default nextval('users_id_seq'::regclass) |
| first_name                  | character varying(255)      |                                                     |
| last_name                   | character varying(255)      |                                                     |
+-----------------------------+-----------------------------+-----------------------------------------------------+
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)

I've created an RLS policy that limits which users can be read by whom using a jwt token.

create policy select_users_policy
  on public.users
  for select using (
    (current_setting('jwt.claims.user_id'::text, true)::integer, id) in (
      select id1, id2 from user_access_pairs
    )
  );

This seems to work logically but I'm getting awful performance. The query planner does a sequential scan on user_access_pairs despite there being an index there.

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
    select first_name, last_name
    from users
+------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                         |
|------------------------------------------------------------------------------------------------------------------------------------|
| Seq Scan on public.users  (cost=231.84..547.19 rows=2386 width=14) (actual time=5.481..6.418 rows=2 loops=1)                       |
|   Output: users.first_name, users.last_name                                                                                        |
|   Filter: (hashed SubPlan 1)                                                                                                       |
|   Rows Removed by Filter: 4769                                                                                                     |
|   SubPlan 1                                                                                                                        |
|     ->  Seq Scan on public.user_access_pairs  (cost=0.00..197.67 rows=13667 width=8) (actual time=0.005..1.107 rows=13667 loops=1) |
|           Output: user_access_pairs.id1, user_access_pairs.id2                                                                     |
| Planning Time: 0.072 ms                                                                                                            |
| Execution Time: 6.521 ms                                                                                                           |
+------------------------------------------------------------------------------------------------------------------------------------+

However, if I switch to a super user role that bypasses RLS and apply the same filter manually I get much better performance. Shouldn't it be the same thing?

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
   select first_name, last_name
   from users
   where (current_setting('jwt.claims.user_id'::text, true)::integer, id) in (
     select id1, id2 from user_access_pairs
   )
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| QUERY PLAN
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Nested Loop  (cost=4.59..27.86 rows=2 width=14) (actual time=0.041..0.057 rows=2 loops=1)
|   Output: users.first_name, users.last_name
|   Inner Unique: true
|   ->  Bitmap Heap Scan on public.user_access_pairs  (cost=4.31..11.26 rows=2 width=4) (actual time=0.029..0.036 rows=2 loops=1)
|         Output: user_access_pairs.id1, user_access_pairs.id2
|         Filter: ((current_setting('jwt.claims.user_id'::text, true))::integer = user_access_pairs.id1)
|         Heap Blocks: exact=2
|         ->  Bitmap Index Scan on index_user_access_pairs  (cost=0.00..4.31 rows=2 width=0) (actual time=0.018..0.018 rows=2 loops=1)
|               Index Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)
|   ->  Index Scan using users_pkey on public.users  (cost=0.28..8.30 rows=1 width=18) (actual time=0.008..0.008 rows=1 loops=2)
|         Output: users.id, users.email, users.encrypted_password, users.first_name, users.last_name, users.roles_mask, users.reset_password_token, users.reset_password_sent_at, users.remember_created_at, users.sign_in_count, users.current_sign_in_at, users.last_sign_in_at,
|         Index Cond: (users.id = user_access_pairs.id2)
| Planning Time: 0.526 ms
| Execution Time: 0.116 ms
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Why isn't RLS using the index when making the query?

PS I'm using PostgreSQL version 12.4

database> select version()
+-------------------------------------------------------------------------------------------------------------------------------+
| version                                                                                                                       |
|-------------------------------------------------------------------------------------------------------------------------------|
| PostgreSQL 12.4 (Ubuntu 12.4-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, 64-bit |
+-------------------------------------------------------------------------------------------------------------------------------+

EDIT

Thanks for the response Laurenz. It improved performance a lot. But I'm still getting some seq scans.

Here's the updated policy as Laurenz suggested.

create policy select_users_policy
  on public.users
  for select using (
    exists (
      select 1
      from user_access_pairs
      where
        id1 = current_setting('jwt.claims.user_id'::text, true)::integer
        and id2 = users.id
    )
  );

Querying this table with RLS still gives me a seq scan on the users table even though the exists query in the policy is using indexes.

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
  select first_name, last_name
  from users
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                            |
|-------------------------------------------------------------------------------------------------------------------------------------------------------|
| Seq Scan on public.users  (cost=0.00..40048.81 rows=2394 width=14) (actual time=0.637..1.216 rows=2 loops=1)                                          |
|   Output: users.first_name, users.last_name                                                                                                           |
|   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)                                                                                               |
|   Rows Removed by Filter: 4785                                                                                                                        |
|   SubPlan 1                                                                                                                                           |
|     ->  Index Only Scan using index_user_access_pairs on public.user_access_pairs  (cost=0.29..8.31 rows=1 width=0) (never executed)                  |
|           Index Cond: ((user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer) AND (user_access_pairs.id2 = users.id)) |
|           Heap Fetches: 0                                                                                                                             |
|   SubPlan 2                                                                                                                                           |
|     ->  Bitmap Heap Scan on public.user_access_pairs user_access_pairs_1  (cost=4.31..11.26 rows=2 width=4) (actual time=0.075..0.083 rows=2 loops=1) |
|           Output: user_access_pairs_1.id2                                                                                                             |
|           Recheck Cond: (user_access_pairs_1.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                                      |
|           Heap Blocks: exact=2                                                                                                                        |
|           ->  Bitmap Index Scan on index_user_access_pairs_on_id1  (cost=0.00..4.31 rows=2 width=0) (actual time=0.064..0.064 rows=2 loops=1)         |
|                 Index Cond: (user_access_pairs_1.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                                  |
| Planning Time: 0.572 ms                                                                                                                               |
| Execution Time: 1.295 ms                                                                                                                              |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+

Here's the same query done "manually" without RLS for comparison. This time there's no seq scan and performance is significantly better (especially when run on bigger datasets)

database> set jwt.claims.user_id to '2222';
database> explain analyze verbose
    select first_name, last_name
    from users
    where exists (
       select 1
       from user_access_pairs
       where
         id1 = current_setting('jwt.claims.user_id'::text, true)::integer
         and id2 = users.id
     )

+---------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN                                                                                                                                  |
|---------------------------------------------------------------------------------------------------------------------------------------------|
| Nested Loop  (cost=4.59..27.86 rows=2 width=14) (actual time=0.020..0.033 rows=2 loops=1)                                                   |
|   Output: users.first_name, users.last_name                                                                                                 |
|   Inner Unique: true                                                                                                                        |
|   ->  Bitmap Heap Scan on public.user_access_pairs  (cost=4.31..11.26 rows=2 width=4) (actual time=0.013..0.016 rows=2 loops=1)             |
|         Output: user_access_pairs.id1, user_access_pairs.id2                                                                                |
|         Recheck Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                                |
|         Heap Blocks: exact=2                                                                                                                |
|         ->  Bitmap Index Scan on index_user_access_pairs_on_id1  (cost=0.00..4.31 rows=2 width=0) (actual time=0.010..0.010 rows=2 loops=1) |
|               Index Cond: (user_access_pairs.id1 = (current_setting('jwt.claims.user_id'::text, true))::integer)                            |
|   ->  Index Scan using users_pkey on public.users  (cost=0.28..8.30 rows=1 width=18) (actual time=0.006..0.006 rows=1 loops=2)              |
|         Output: users.id, users.email, users.encrypted_password, users.first_name, users.last_name, users.roles_mask                        |
|         Index Cond: (users.id = user_access_pairs.id2)                                                                                      |
| Planning Time: 0.464 ms                                                                                                                     |
| Execution Time: 0.075 ms                                                                                                                    |
+---------------------------------------------------------------------------------------------------------------------------------------------+

I would have guessed that the query planner would treat these two queries the same. Why do they differ and what can be done to avoid the seq scan?

Atomizer answered 27/8, 2020 at 14:7 Comment(3)
You're getting a full scan on USERS because that's what you've told it to do. There's nothing in your policy which limits the rows it needs to look at in USERS - you've said "Take current_setting('jwt.claims.user_id'...), pair it with every ID in the USERS table, and now see which of those pairs exist in USER_ACCESS_PAIRS". The policy uses every row in USERS so of course it's a full scan. I expect that at that point the database says, "Screw it - I'm reading the whole USERS table so I might just as well do a full scan on USER_ACCESS_PAIRS".Amorphous
The query planner is clearly able to do it without a seq scan in the last example in the question. Here it's 17 times faster. It was thousands of times quicker in a bigger data set. I'm curious why the planner doesn't think that the two queries are equal when they're constructed from the same parts.Atomizer
Did you ever manage to find a solution or workaround for this issue?Chair
H
6

The reason why you are not seeing the same plan as the seemingly equivalent query without the RLS policy is that subquery pullup is happening before RLS policies are taken into account. This is a planner quirk.

To summarize, RLS policies in combination with subqueries are unfortunately not each other friends performance-wise.

For your information, a similar manifestation can be seen when comparing the following two queries:

SELECT ... FROM my_table WHERE                     EXISTS(SELECT ...);
SELECT ... FROM my_table WHERE CASE WHEN true THEN EXISTS(SELECT ...) END;

Here, while both queries are equivalent, the second query results in a (hashed) subplan for the subquery, because the folding of the unnecessary CASE WHEN true is done after subquery pullup.

Disclaimer: I got this information from RhodiumToad on IRC #postgresql, but explained/simplified it in my own words.

Hasseman answered 8/9, 2020 at 17:45 Comment(0)
M
3

I cannot put my finger on the difference, but I think you should get a better plan with a smarter policy:

CREATE POLICY select_users_policy ON public.users
  FOR SELECT
  USING (
     EXISTS (SELECT 1 FROM user_access_pairs
             WHERE id1 = current_setting('jwt.claims.user_id'::text, true)
               AND id2 = users.id)
  );

I'd like to mention that basing row level security on a placeholder variable that the user can change any time is questionable security.

Maw answered 27/8, 2020 at 14:25 Comment(2)
Thanks a lot for the response! it did improve performance a lot but it's still not as good as I would have expected. I've updated the question to include your suggestion.Atomizer
@Laurenz Albe in this case the "user" does not have access to the database (thus can not change the variable), this policy is used in the context of PostgRESTPedicel
O
2

The author of this comment came up (by trial and error) with the solution of casting a subquery to ARRAY. Not at all sure it's applicable in your case but just goes to show that pretty unexpected tricks can apparently scare the optimizer into doing its job.

So you could try:

create policy select_users_policy
on public.users
for select using (
  users.id = any (
    array(
        select id1
        from user_access_pairs
        where 
            id1 = current_setting('jwt.claims.user_id'::text, true)::integer
            and id2 = users.id
        )
    )
);

Pretty awkward, but who knows...

Onionskin answered 28/9, 2020 at 19:58 Comment(0)
O
1

It's not stated in the question, but I am assuming that the read from public.users is triggered from another, API-facing schema (let's call it api).

One person over on the subZero Slack shared:

I ran into the same problem and defined the RLS based on my api views which solved the seq scan problem. But it's a bit of a pain to maintain when making changes to these views, because for the migration I have to first drop the RLS policies, change the view and then re-create the policies. ... When there's subqueries involved in the RLS I use the api views.

So, they are using the exact same rule but referencing api.foo and api.bar views instead of public.foo and public.bar tables.

In your case, you could try:

create policy select_users_policy
  on public.users
  for select using (
    exists (
      select 1
      from api.user_access_pairs
      where
        id1 = current_setting('jwt.claims.user_id'::text, true)::integer
        and id2 = api.users.id
    )
  );

So this is assuming you had a users view in the api schema mirroring public.users, and moving user_access_pairs to api as well (or create a view referencing it).

It's not clear to me whether this works because the query is triggered from a view/function in the api schema in the first place, and so referencing views in that schema is somehow less confusing for the query optimizer, or if this is just a trick to make the optimizer kick in, regardless of how the query originated. (The latter seems slightly more likely in my view, but who knows.)

Onionskin answered 6/10, 2020 at 13:18 Comment(0)
O
1

Another user on the subZero Slack shared a solution based on wrapping the look-up of the current users permissions in a function. In your case, something like:

create policy select_users_policy
  on public.users
  for select using (
    id IN (
      select * from current_user_read_users()
   )
  );

And you would create a current_user_read_users() function which looks up the user_id from the jwt and returns the set of users the current user may read, based on user_access_pairs.

It may or may not be important that this function has the same owner as the user_access_pairs view, or that the function is declared with SECURITY DEFINER (so that it bypasses RLS). It could be that the important part is merely to pull the subquery out into a function (somehow helping the optimizer), but the other things are reported to help with other performance problems.

And lastly, you might want to experiment with putting this in the api view, as in the other solution I reported.

One caveat:

there is a circular dependency issue on the permission table itself, so there I had to do one special case policy. That one didn’t have any performance issues, though, so it was fine.

(Note that in their case, the permissions were kept in a table, editable by admin users, not generated as in your case.)

Onionskin answered 6/10, 2020 at 13:30 Comment(0)
O
1

One solution (based on this post, which has several other good suggestions and benchmarks) is to not use RLS at all, but build the filtering into a view:

create view api.allowed_users
with (security_barrier)
as
  select id, first_name, last_name, favorite_color
  from public.users
  join user_access_pairs uap
    on uap.id1 = current_setting('jwt.claims.user_id'::text, true)::integer

You have already expressed your access policy in the user_access_pairs view, so arguably the RLS rule doesn't really add anything.

(security_barrier is to prevent potential information leaking, but comes with a performance cost, so look into whether it's necessary in your case.)

Onionskin answered 6/10, 2020 at 13:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.