PostgreSQL ignores pg_trgm GIN indexes when Row Level Security is enabled and NOT LEAKPROOF operator is used
Asked Answered
D

1

8

First of all, this SO question describes a similar problem: PostgreSQL query not using INDEX when RLS (Row Level Security) is enabled , but I was not able to successfully utilize it's suggestions and also would like to see if there is a way to improve things in context of Postgraphile.

Reproduction steps:

As a superuser, create a simple table and populate it with some random data:

CREATE TABLE public.videos AS SELECT id, md5(random()::text) AS title from generate_Series(1,1000000) id;

Perform an ILIKE query (further in this post it is referred to as "ILIKE query" and is used to test performance multiple times):

EXPLAIN ANALYSE SELECT COUNT(*) FROM public.videos WHERE title ILIKE '%test%';

As expected, it performs a Seq Scan, with execution time around 194.823 ms

Install gp_trgm extension and add a gin index:

CREATE EXTENSION IF NOT EXISTS pg_trgm WITH SCHEMA public;
CREATE INDEX trgm_idx_videos_title ON public.videos USING gin (title gin_trgm_ops);

Since table is already populated with data, it will take some time (~10 seconds) to create an index. Now, running the same ILIKE query will use a "Bitmap Index Scan on trgm_idx_videos_title" with execution time 0.036 ms.

All seems fine so far, but only if you are fine with always using superuser to get your data and not implementing additional security for said table.

Let's set up an additional user and grant it access to connect to our database (named gin_rls_test)

CREATE ROLE db_login WITH LOGIN PASSWORD 'db_login_pwd' NOINHERIT;
GRANT CONNECT ON DATABASE gin_rls_test TO db_login;

We also need to grant SELECT rights for previously created table for said user.

GRANT SELECT ON public.videos TO db_login;

To make sure that our user can query data in the same way we have done so far, connect to database server with db_login. (In pgAdmin, you just create a new server, specify some different name, same host name, but use db_login/db_login_pwd as username and password)

If you navigate to our table using newly added db_login connection, open a query tool and execute the same ILIKE query - results should be the same, index will be applied.

What breaks this is RLS (Row Level Security). Let's we switch back to Query editor for superuser and set it up for our table:

ALTER TABLE public.videos ENABLE ROW LEVEL SECURITY;

CREATE OR REPLACE FUNCTION public.user_has_permission() returns boolean LANGUAGE plpgsql as $$
BEGIN
   return true;
END;
$$;

CREATE POLICY videos_authorization ON public.videos FOR SELECT USING (public.user_has_permission());

For simplicity, public.user_has_permission() function just returns true. (in my case it's a plpgsql function that checks permissions against settings stored in pg_catalog.current_setting and does not perform any additional explicit requests.)

Now, if you run ILIKE query from superuser query editor, it will still be super fast as before, because superuser avoids rls. If you run it from db_login query editor, index will no longer be hit, Seq Scan will be used and Execution time will be around 1013.485 ms.

After reading through this thread (https://www.postgresql.org/message-id/CAGrP7a3PwDYJhPe53yE6pBPPNxk2Ve4n%2BdPQMS1HcBU6swXYfA%40mail.gmail.com), it seems like the problem happens because underlying function for ILIKE is not leakproof. EXPLAIN ANALYSE tells us that we use operator ~~* for text comparison, running this query will let you know names of underlying functions for this operator:

SELECT * FROM pg_operator WHERE oprname = '~~*';

There are actually 3 results, in our case underlying function name is "texticlike". What you can do is switch to superuser query editor and make that function leakproof:

ALTER FUNCTION texticlike LEAKPROOF;

Now if you run ILIKE query from db_login query editor again, index will be hit and execution time will be back to 0.040 ms.

Problem:

Perhaps making certain operators explicitly leakproof is acceptable, but the real problem (apart from accidentally leaking something, e.g. if exception is thrown) is that only superuser can make a function leakproof. If you have an Azure or AWE hosted database, you will not have superuser access and will get following error trying to make a function leakproof:

ERROR:  only superuser can define a leakproof function
SQL state: 42501

So for me, it's still an unanswered question on how to make GIN index work for ILIKE queries on tables that have RLS enabled. Or what would be an alternative to achieve same performance results while still having that title property as a text type?

I'm working with Postgraphile and I'm interested in improving performance of "includesInsensitive" and "startsWithInsensitive" filters, and ILIKE operator leakproof affects those filters as well.

Additional information:

If you want to switch function back no not being LEAKPROOF:

ALTER FUNCTION texticlike NOT LEAKPROOF;

I tried to use this query to find all available leakproof operators, but have not found any viable alternatives among them (closest thing is "starts_with" function for operator ^@, which is case sensitive):

select pg_proc.proname, pg_operator.oprname, pg_operator.oprcode, pg_proc.proleakproof from pg_proc
join pg_operator ON pg_proc.proname::text = pg_operator.oprcode::text
where pg_proc.proleakproof;

Original idea of using gp_trgm gin index came from this article: https://niallburkley.com/blog/index-columns-for-like-in-postgres/

PostgreSQL version (using SELECT version();) - "PostgreSQL 12.2 on x86_64-pc-linux-musl, compiled by gcc (Alpine 9.2.0) 9.2.0, 64-bit"

Demiurge answered 21/7, 2020 at 6:44 Comment(1)
Nice question, but I don't see any solutions. Running in a hosted environment where you don't have a real superuser can be a very frustrating experience.Wiltonwiltsey
G
3

You can try to use views without a security barrier and replicate a RLS predicate directly into a view:

create view view_video as
select * from videos
 where user_has_permission(); -- a predicate from RLS

And query data against the view without compromising on speed. Since the view query is transformed into a query on a regular table, all indexes will be used.

Gaylor answered 26/7, 2020 at 8:46 Comment(3)
or if you're crazy like me (or know your threat model), you can update pg_proc set proleakproof = true the appropriate functions :)Pickax
@FlorianKlein there is no permission to change pg_proc in a hosted database such as Azure or AWS, that is the problemGaylor
indeed, on managed stuff with limited permissions you could wrap all the non-leakproof functions in leakproof ones, but it adds to the call stack for nothingPickax

© 2022 - 2024 — McMap. All rights reserved.