PL/pgSQL for all-in-one dynamic query
Asked Answered
S

2

0

I am using PostigreSQL10. I have an all-in-one query where a user can pick a non-standard combination of category style, event, area to search for a constructions. Keep in mind that category style, event, area are in different tables.

I want to avoid the multiple IFs and JOINs. I also want to avoid the query planner caching the wrong plan for the wrong parameter combination and the sequel recompilation of the query every single time. So I have to use dynamic SQL.

To get dynamic SQL in PostgreSQL, I have to use PL/pgSQL. But, according to its documentation

to generate dynamic commands inside your PL/pgSQL functions, that is, commands that will involve different tables or different data types each time they are executed. PL/pgSQL's normal attempts to cache plans for commands will not work in such scenarios. To handle this sort of problem, the EXECUTE statement is provided. Also, there is no plan caching for commands executed via EXECUTE. Instead, the command is always planned each time the statement is run. If multiple rows are returned, only the first will be assigned to the INTO variable - here

and

SQL commands that appear directly in a PL/pgSQL function must refer to the same tables and columns on every execution; that is, you cannot use a parameter as the name of a table or column in an SQL command. To get around this restriction, you can construct dynamic commands using the PL/pgSQL EXECUTE statement — at the price of performing new parse analysis and constructing a new execution plan on every execution - here

So, I guess that PL/pgSQL is not the right tool for my case, since I have multiple tables.

My question is: is PL/pgSQL actually not suitable for my case, or am I missing something ? Sub-question: if it not suitable, how can I syntax dynamic sql for postgreSQL, I cannot find any tutorial.

Thanks

Sprayberry answered 11/1, 2018 at 22:38 Comment(9)
I have done exactly what you are asking about for a gigantic (1500 line) plpgsql function a few years ago, which we used to create custom reports. We called it the "god function" because it has about 30 table it could join to, and scores of columns (all of which were dynamically selected by the user. In addition, it had different checks for tables that may or may not be accessible by certain classes of users. It was a bear to write, and a pain to add new reports to, but overall it worked remarkably well. I estimated it saved me 2-3 hours a day by not needing to run custom reports all the time.Gallaway
The queries that huge function generated were entirely dynamic, and there was some debugging in the query to emit the generated query to a table with user, timestamp, uuid, etc associated with it.Gallaway
One more thing. Because it used EXECUTE and FORMAT commands when building the dynamica SQL, the chances of SQL injection vulnerabilities was reduced. Note that the function was executed by a user with only read-only privileges on all the tables and views available to the reports. The Postgres db we were using at the time was about 2TB, so wasn't really a toy db.Gallaway
@Gallaway Thanks. So, what I am asking is possible with PL/pgSQL? What confuses me is the PL/pgSQL documentation that declares all those restrictions. By the way can you suggest a good tutorial for PL/pgSQL and dynamic SQL? I cannot find any. I just found some, about PL/pgSQL, but they offer nothing new, they just repeat the documentation, in a simpler way.Sprayberry
Absolutely it's possible. I don't have any tutorials to recommend (I learned it from the docs and trial-and-error over the years) but I have seem some amazing examples here in SO and DBA-SO by Erwin Brandstetter and Craig Ringer ( eg. this ). The answers below have some examples to work from as well.Gallaway
@Gallaway What is weird is that the code in pl/pgsql and the code of just a query, both have the same IFs and JOINs. Also the restrictions I found in the pl/pgsql documentation (bad query plan caching, difficult to handle a series of IFs) apply to my case, since I have multiple tables. So why not use a simple query after all ? Look at this really simple dynamic SQL, how neatly handles IFs. And the query planner caches a plan for each case. This is not postgre-related but I am looking for something like this. ThanksSprayberry
Perhaps I'm misunderstanding, but nothing indicates plpgsql won't do the job. Using EXECUTE will cause a replan for every execution of the function (which is what you want, if you are truly creating dynamic SQL), you can create your function to have dynamic input (IN arrays, HSTORE, VARIADIC, etc), the multi-row output is returned via RETURN QUERY or RETURNS SETOF, etc. Of course, if you can create a bunch of plain SQL and execute that, it will likely perform better as the planner can make better decisions.Gallaway
@Gallaway Hi again. Thanks to you and others in this thread, I managed to create some dynamic queries in pl/pgsql. Query planning still confuses me - check here if you want - but anyway, thanks after all. I guess I will use EXECUTE after all due to constant query planningSprayberry
@Gallaway Oh, also, you will see there my dynamic query and how easily I can add IFs..Sprayberry
E
1

Can you please post some table definitions and a sample query of what you're trying to do? I'm not 100% sure what you're after, but there are a couple forms of "dynamic" SQL using stored procedures/functions:

  1. Create a function that takes input parameters (i.e. categoryType, styleId, eventName, areaId) and plug those values into a "static" SQL request. Here's a sample query snippet for your case:
SELECT *
FROM category cat
INNER JOIN style st ON cat.styleid = style.id
WHERE (cat.categoryType = pCategoryType OR pCategoryType IS NULL)
AND (st.id = pStyleId OR pStyleId IS NULL)

Here's a real example:

CREATE OR REPLACE FUNCTION SP_IGLGetItem(
    pItemId INTEGER
) 
RETURNS TABLE(
    ItemId INTEGER,
    ItemName VARCHAR(100),
    ItemCategory CHAR(2) 
AS
$$
BEGIN
    RETURN QUERY
    SELECT i.ItemId, i.ItemName, i.ItemCategory
    FROM Item i
    WHERE (i.ItemId = pItemId OR pItemId IS NULL) -- Return single item (if specified, otherwise return all)
    ;
END;
$$
LANGUAGE 'plpgsql';
  1. Build a string containing the SQL you want to execute dynamically based on different conditions, parameter values, etc. This is as dynamic as you can get.

  2. Conditionally run different "static" SQL statements based on the values of your input parameters.

Do any of these match your situation?

PL/PGSQL is just the language used to write stored procedures/functions in Postgres. If you really do need dynamic SQL generation, then you're best bet would be to write a function using PL/PGSQL.

Another option, is to dynamically generate the SQL you want in your client application and then just submit that SQL directly to be executed.

Embonpoint answered 11/1, 2018 at 23:9 Comment(4)
Hi. Thanks for the code. 1 and 2 match my case. Here is the problem I am trying to solve. Basically a lot of JOINs and IFs, based on what users choose.Sprayberry
I asked others and I will ask your opinion. Code in pl/pgsql and code of just a query, both have the same IFs and JOINs. The restrictions I found in the pl/pgsql documentation (bad query plan caching, difficult to handle a series of IFs) apply to my case, since I have multiple tables. So why not use a simple query after all ? Look at this really simple dynamic SQL, how neatly handles IFs. And the query planner caches a plan for each case. This is not postgre-related but I am looking for something like this. ThanksSprayberry
It boils down to a couple things: maintenance and performance. I'd try all the different possible ways and see how performance shakes out for you. If the difference is negligible and you don't expect the data characteristics (# rows, # unique values, etc.) to change too much, then go with the one that's easier for you. But typically performance trumps everything, so if that means writing "ugly" and maintenance-nightmare code, sometimes that's what you have to do.Embonpoint
Hi again. Thanks to you and others in this thread, I managed to create some dynamic queries in pl/pgsql. Query planning still confuses me - check here if you want - but anyway, thanks after all. I guess I will use EXECUTE after all due to constant query planning. Oh, also, you will see there my dynamic query and how easily I can add IFs..Sprayberry
D
1

You can run most queries inside pl/pgsql execute.

For example, this 2 table join select will work just fine:

drop table if exists dyn_tab1;
create table dyn_tab1 (id int primary key, value text);
insert into dyn_tab1 values (1, 'test1'), (2, 'test2');

drop table if exists dyn_tab2;
create table dyn_tab2
  (id serial primary key, fk_id int references dyn_tab1(id), value text);
insert into dyn_tab2 (fk_id, value)
values (1, 'blahblah'), (1, 'blahblah3'), (1, 'foobar'), (2, 'asdf');

select *
from dyn_tab1 as t1
join dyn_tab2 as t2 on t2.fk_id = t1.id;

--I'm mixing here both format and USING (prepared statement).
--You can use format for everything tho. Or just concat strings.
do $$
declare
  l_row record;
begin
for l_row in 
execute format($query$ select *
                       from %I as t1
                       join %I as t2 on t2.fk_id = t1.id
                       where t1.id = $1;
               $query$, 'dyn_tab1', 'dyn_tab2')
using 2
loop
raise notice 'record: %', l_row;
end loop;
end;
$$;

create function dyn_test()
returns setof record
as $$
begin
return query execute format($query$ select *
                       from %I as t1
                       join %I as t2 on t2.fk_id = t1.id
                       where t1.id = $1;
               $query$, 'dyn_tab1', 'dyn_tab2')
using 2;
end;
$$ language plpgsql;

select * from dyn_test() as (id int, value text, id2 int, fk int, value2 text);
Displayed answered 12/1, 2018 at 8:48 Comment(2)
Really useful code, thanks. Here is the big question though. The restrictions I mention are true because I found them in the documentation. So, they apply to my case since I use multiple tables. So, why really use pl/pgsql and not just a query if I will use a couple of IFs and JOINs and the restrictions are the same?Sprayberry
Rephrase : The restrictions I found in the documentation also apply to my case since I use multiple tables. So, why really use pl/pgsql and not just a query for a couple of IFs and JOINs if the restrictions are the same (bad query planning, same numer of IFs)? Look at this really simple dynamic SQL, how neatly handles IFs. And the query planner caches a plan for each case. This is not postgre-related but I am looking for something like this. ThanksSprayberry

© 2022 - 2024 — McMap. All rights reserved.