Execute a dynamic crosstab query
Asked Answered
H

2

7

I implemented this function in my Postgres database: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/

Here's the function:

create or replace function xtab (tablename varchar, rowc varchar, colc varchar, cellc varchar, celldatatype varchar) returns varchar language plpgsql as $$
declare
    dynsql1 varchar;
    dynsql2 varchar;
    columnlist varchar;
begin
    -- 1. retrieve list of column names.
    dynsql1 = 'select string_agg(distinct '||colc||'||'' '||celldatatype||''','','' order by '||colc||'||'' '||celldatatype||''') from '||tablename||';';
    execute dynsql1 into columnlist;
    -- 2. set up the crosstab query
    dynsql2 = 'select * from crosstab (
 ''select '||rowc||','||colc||','||cellc||' from '||tablename||' group by 1,2 order by 1,2'',
 ''select distinct '||colc||' from '||tablename||' order by 1''
 )
 as ct (
 '||rowc||' varchar,'||columnlist||'
 );';
    return dynsql2;
end
$$;

So now I can call the function:

select xtab('globalpayments','month','currency','(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)','text');

Which returns (because the return type of the function is varchar):

select * from crosstab (
   'select month,currency,(sum(total_fees)/sum(txn_amount)*100)::decimal(48,2)
    from globalpayments
    group by 1,2
    order by 1,2'
 , 'select distinct currency
    from globalpayments
    order by 1'
   )  as ct ( month varchar,CAD text,EUR text,GBP text,USD text );

How can I get this function to not only generate the code for the dynamic crosstab, but also execute the result? I.e., the result when I manually copy/paste/execute is this. But I want it to execute without that extra step: the function shall assemble the dynamic query and execute it:

query result

Edit 1

This function comes close, but I need it to return more than just the first column of the first record

Taken from: Are there any way to execute a query inside the string value (like eval) in PostgreSQL?

create or replace function eval( sql  text ) returns text as $$
declare
  as_txt  text;
begin
  if  sql is null  then  return null ;  end if ;
  execute  sql  into  as_txt ;
  return  as_txt ;
end;
$$ language plpgsql

usage: select * from eval($$select * from analytics limit 1$$)

However it just returns the first column of the first record :

eval
----
2015

when the actual result looks like this:

Year, Month, Date, TPV_USD
---- ----- ------ --------
2016, 3, 2016-03-31, 100000
Histaminase answered 22/4, 2016 at 22:57 Comment(8)
Are you looking for something like EXECUTE IMMEDIATE?Noctilucent
Probably, but how would I slot that in to the existing function?Histaminase
also, from the website I got the idea from, he says "You can’t execute the generated crosstab query automatically because PL/pgSQL functions that return tables (setof record functions) cannot have the number and type of columns determined on the fly. " - could you use EXECUTE IMMEDIATE to remove this impediment? for example execute immediate 'select 1' and not have it error... :(Histaminase
and number of rows to be returned varying from call to call, isn't it? Is there a maximum?Precocious
You forgot to provide your version of Postgres. Your desired result contradicts your current query, where currency columns are in ascending order. Also, preparing a query does not solve the problem. You still need two round trips to the DB server: 1. prepare, 2. execute / retrieve returned rows. EXECUTE IMMEDIATE in ecpg that has be mentioned does not return rows and is not applicable to your problem because you obviously want to retrieve result rows.Armalla
Postgres 9.2. Is there a way to create another function that will take this function's output and execute that? (ie function that prepares the dynamic sql - and function that then executes that string?)Histaminase
@Precocious yes the number of rows and columns are dynamic. If the rows were fixed could I implement dynamic crosstab (with columns being only known at run-time)?Histaminase
Your "Edit 1" doesn't get you any closer to returning completely dynamic row types. Strictly not possible.Armalla
A
8

What you ask for is impossible. SQL is a strictly typed language. PostgreSQL functions need to declare a return type (RETURNS ..) at the time of creation.

A limited way around this is with polymorphic functions. If you can provide the return type at the time of the function call. But that's not evident from your question.

You can return a completely dynamic result with anonymous records. But then you are required to provide a column definition list with every call. And how do you know about the returned columns? Catch 22.

There are various workarounds, depending on what you need or can work with. Since all your data columns seem to share the same data type, I suggest to return an array: text[]. Or you could return a document type like hstore or json. Related:

But it might be simpler to just use two calls: 1: Let Postgres build the query. 2: Execute and retrieve returned rows.


I would not use the function from Eric Minikel as presented in your question at all. It is not safe against SQL injection by way of maliciously malformed identifiers. Use format() to build query strings unless you are running an outdated version older than Postgres 9.1.

A shorter and cleaner implementation could look like this:

CREATE OR REPLACE FUNCTION xtab(_tbl regclass, _row text, _cat text
                              , _expr text  -- still vulnerable to SQL injection!
                              , _type regtype)
  RETURNS text
  LANGUAGE plpgsql AS
$func$
DECLARE
   _cat_list text;
   _col_list text;
BEGIN

-- generate categories for xtab param and col definition list    
EXECUTE format(
 $$SELECT string_agg(quote_literal(x.cat), '), (')
        , string_agg(quote_ident  (x.cat), %L)
   FROM  (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
 , ' ' || _type || ', ', _cat, _tbl)
INTO  _cat_list, _col_list;

-- generate query string
RETURN format(
'SELECT * FROM crosstab(
   $q$SELECT %I, %I, %s
      FROM   %I
      GROUP  BY 1, 2  -- only works if the 3rd column is an aggregate expression
      ORDER  BY 1, 2$q$
 , $c$VALUES (%5$s)$c$
   ) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr  -- expr must be an aggregate expression!
, _tbl, _cat_list, _col_list, _type);

END
$func$;

Same function call as your original version. The function crosstab() is provided by the additional module tablefunc which has to be installed. Basics:

This handles column and table names safely. Note the use of object identifier types regclass and regtype. Also works for schema-qualified names.

However, it is not completely safe while you pass a string to be executed as expression (_expr - cellc in your original query). This kind of input is inherently unsafe against SQL injection and should never be exposed to the general public.

Scans the table only once for both lists of categories and should be a bit faster.

Still can't return completely dynamic row types since that's strictly not possible.

Armalla answered 25/4, 2016 at 2:20 Comment(9)
Thanks for your answer - still trying to digest. No need to worry about sql injection - this is for data analysis purposes on my own database - no one else but me on it. Are you entirely sure the there is no way to mimic any sort of dynamic crosstab query functionality similar to the MSSQL implementation, which is string interpolation, and an exec() call on the string?Histaminase
Also, would you have an example of how to refactor Eric's presented function using format() ?Histaminase
@JohnSkilbeck: I added an implementation with format().Armalla
Erwin - you are extremely knowledgeable with postgres. Are you contributor to the postgres source code? Is there any possibility you'd want to scope with me to see what it would take to work together to implement this feature?Histaminase
@JohnSkilbeck: I was on the dev team of pgAdmin for some time. All I contributed to the Postgres source is a couple of bug reports and some patches for documentation. You might raise your idea on pgsql-general, but don't get your hopes up, principals of SQL and the Postgres query planner stand against your idea.Armalla
The code in the answer creates a crosstab query. How could I execute it?Glume
@Wondie: Run it like any other query. The tablefunc module has to be installed (which was clear to the OP). I added pointers above.Armalla
@ErwinBrandstetter thank you for the further elaboration. Meaning I copy and past it and run the query?Glume
@wondie: Yes, for instance. A second round trips to the server.Armalla
T
2

Not quite impossible, you can still execute it (from a query execute the string and return SETOF RECORD.

Then you have to specify the return record format. The reason in this case is that the planner needs to know the return format before it can make certain decisions (materialization comes to mind).

So in this case you would EXECUTE the query, return the rows and return SETOF RECORD.

For example, we could do something like this with a wrapper function but the same logic could be folded into your function:

CREATE OR REPLACE FUNCTION crosstab_wrapper
(tablename varchar, rowc varchar, colc varchar, 
 cellc varchar, celldatatype varchar) 
returns setof record language plpgsql as $$
    DECLARE outrow record;
    BEGIN
       FOR outrow IN EXECUTE xtab($1, $2, $3, $4, $5)
       LOOP
           RETURN NEXT outrow
       END LOOP;
    END;
 $$;

Then you supply the record structure on calling the function just like you do with crosstab. Then when you all the query you would have to supply a record structure (as (col1 type, col2 type, etc) like you do with connectby.

Timeworn answered 27/4, 2016 at 18:58 Comment(4)
Thanks for the response. Can you provide a sample, either with my given example or your own?Histaminase
I called this "catch 22" in my answer. How would you "supply a record structure" (column definition list) for a completely dynamic row type? You would have to know it before the function call, which would not be completely dynamic to begin with - and there are more elegant solutions if you know the return type anyway.Armalla
@Chris I received this error from that code: ERROR: set-valued function called in context that cannot accept a set CONTEXT: PL/pgSQL function crosstab_wrapper(character varying,character varying,character varying,character varying,character varying) line 6 at RETURN NEXT Query failed PostgreSQL said: set-valued function called in context that cannot accept a setHistaminase
select * from my_function(...) as r (...)Timeworn

© 2022 - 2024 — McMap. All rights reserved.