Dynamically generate columns for crosstab in PostgreSQL
Asked Answered
F

4

26

I am trying to create crosstab queries in PostgreSQL such that it automatically generates the crosstab columns instead of hardcoding it. I have written a function that dynamically generates the column list that I need for my crosstab query. The idea is to substitute the result of this function in the crosstab query using dynamic sql.

How do I do it in PostgreSQL?

-- Table which has be pivoted
CREATE TABLE test_db
(
    kernel_id int,
    key int,
    value int
);

INSERT INTO test_db VALUES
(1,1,99),
(1,2,78),
(2,1,66),
(3,1,44),
(3,2,55),
(3,3,89);


-- This function dynamically returns the list of columns for crosstab
CREATE FUNCTION test() RETURNS TEXT AS '
DECLARE
    key_id int;
    text_op TEXT = '' kernel_id int, '';
BEGIN
    FOR key_id IN SELECT DISTINCT key FROM test_db ORDER BY key LOOP
    text_op := text_op || key_id || '' int , '' ;
    END LOOP;
    text_op := text_op || '' DUMMY text'';
    RETURN text_op;
END;
' LANGUAGE 'plpgsql';

-- This query works. I just need to convert the static list
-- of crosstab columns to be generated dynamically.
SELECT * FROM
crosstab
(
    'SELECT kernel_id, key, value FROM test_db ORDER BY 1,2',
    'SELECT DISTINCT key FROM test_db ORDER BY 1'
)
AS x (kernel_id int, key1 int, key2 int, key3 int); -- How can I replace ..
-- .. this static list with a dynamically generated list of columns ?
Freeliving answered 14/10, 2012 at 5:56 Comment(0)
A
16

You can use the provided C function crosstab_hash for this.

The manual is not very clear in this respect. It's mentioned at the end of the chapter on crosstab() with two parameters:

You can create predefined functions to avoid having to write out the result column names and types in each query. See the examples in the previous section. The underlying C function for this form of crosstab is named crosstab_hash.

For your example:

CREATE OR REPLACE FUNCTION f_cross_test_db(text, text)
  RETURNS TABLE (kernel_id int, key1 int, key2 int, key3 int)
  AS '$libdir/tablefunc','crosstab_hash' LANGUAGE C STABLE STRICT;

Call:

SELECT * FROM f_cross_test_db(
      'SELECT kernel_id, key, value FROM test_db ORDER BY 1,2'
     ,'SELECT DISTINCT key FROM test_db ORDER BY 1');

Note that you need to create a distinct crosstab_hash function for every crosstab function with a different return type.

Related:


Your function to generate the column list is rather convoluted, the result is incorrect (int missing after kernel_id), it can be replaced with this SQL query:

SELECT 'kernel_id int, '
       || string_agg(DISTINCT key::text, ' int, '  ORDER BY key::text)
       || ' int, DUMMY text'
FROM   test_db;

And it cannot be used dynamically anyway.

Ancilla answered 14/10, 2012 at 23:46 Comment(5)
Thanks for the detailed reply @Erwin. I was able to make the basic crosstab work with a static set of crosstab fields. I am kind of surprised that we don't have any way of dynamically generating the list of columns and thus a dynamic crosstab in postgresql. Is there any other round-about or hacky way of doing this in postgresql that you are aware of ?Freeliving
@invinc4u: The problem is that the return type of a function cannot be altered dynamically. You could recreate the function itself dynamically and then call it immediately. But that's tricky business ...Ancilla
Ya that makes sense. But I was hoping there that there was some hacky way using dyanmic SQL which would have solved the problem. After preparing the dynamic SQL, we should just be able to execute the query and it should generate the dynamic crosstab for us. See this for example : muhammedsalimp.wordpress.com/2010/07/16/…. I am already missing my SQL Server :( Anyways thanks a tonne for your help and guidance !!Freeliving
I am a little confused. How is this answering the question? As per the solution, we still need to type out the column list and type here: RETURNS TABLE (kernel_id int, key1 int, key2 int, key3 int). Isn't it exactly what the OP wants to avoid?Duron
Hi @ErwinBrandstetter, could you help and take a look at my new mythtery ... dba.stackexchange.com/questions/275575/…Muldrow
P
5

I realise this is an older post but struggled for a little while on the same issue.

My Problem Statement: I had a table with muliple values in a field and wanted to create a crosstab query with 40+ column headings per row.

My Solution was to create a function which looped through the table column to grab values that I wanted to use as column headings within the crosstab query.

Within this function I could then Create the crosstab query. In my use case I added this crosstab result into a separate table.

E.g.

CREATE OR REPLACE FUNCTION field_values_ct ()
 RETURNS VOID AS $$
DECLARE rec RECORD;
DECLARE str text;
BEGIN
str := '"Issue ID" text,';
   -- looping to get column heading string
   FOR rec IN SELECT DISTINCT field_name
        FROM issue_fields
        ORDER BY field_name
    LOOP
    str :=  str || '"' || rec.field_name || '" text' ||',';
    END LOOP;
    str:= substring(str, 0, length(str));

    EXECUTE 'CREATE EXTENSION IF NOT EXISTS tablefunc;
    DROP TABLE IF EXISTS temp_issue_fields;
    CREATE TABLE temp_issue_fields AS
    SELECT *
    FROM crosstab(''select issue_id, field_name, field_value from issue_fields order by 1'',
                 ''SELECT DISTINCT field_name FROM issue_fields ORDER BY 1'')
         AS final_result ('|| str ||')';
END;
$$ LANGUAGE plpgsql;
Prothesis answered 29/11, 2018 at 11:47 Comment(0)
M
4

The approach described here worked well for me. Instead of retrieving the pivot table directly. The easier approach is to let the function generate a SQL query string. Dynamically execute the resulting SQL query string on demand.

The dynamic query pattern in a nutshell:

DO $$ BEGIN
DROP TABLE IF EXISTS temp_pivot;
EXEUTE(SELECT
  'CREATE TEMPORARY TABLE temp_pivot AS SELECT * FROM crosstab(
   ''$value_query'', ''$cat_query'') AS ct (rn text, "'
   || string_agg(cat_col, '" text,"') || '" text);'
   FROM $cat_query 
); 
END $$;
SELECT * FROM temp_pivot;
Muldrow answered 23/5, 2017 at 8:51 Comment(2)
Finally, something working! Thank you Ben (and Erik). Were you able to update that script to run the resulting query right away? Like taking the pivotcode result cell value and execute it.Operative
@Operative You can use it with CREATE TABLE AS SELECT * FROM to set output to a temp table.Muldrow
F
3

@erwin-brandstetter: The return type of the function isn't an issue if you're always returning a JSON type with the converted results.

Here is the function I came up with:

CREATE OR REPLACE FUNCTION report.test(
    i_start_date TIMESTAMPTZ,
    i_end_date TIMESTAMPTZ,
    i_interval INT
    ) RETURNS TABLE (
    tab JSON
    ) AS $ab$
DECLARE
    _key_id TEXT;
    _text_op TEXT = '';
    _ret JSON;
BEGIN
    -- SELECT DISTINCT for query results
    FOR _key_id IN
    SELECT DISTINCT at_name
      FROM report.company_data_date cd 
      JOIN report.company_data_amount cda ON cd.id = cda.company_data_date_id 
      JOIN report.amount_types at ON cda.amount_type_id  = at.id 
     WHERE date_start BETWEEN i_start_date AND i_end_date
       AND interval_type_id = i_interval
    LOOP
    -- build function_call with datatype of column
        IF char_length(_text_op) > 1 THEN
            _text_op := _text_op || ', ' || _key_id || ' NUMERIC(20,2)';
        ELSE
            _text_op := _text_op || _key_id || ' NUMERIC(20,2)';
        END IF;
    END LOOP;
    -- build query with parameter filters
    RETURN QUERY
    EXECUTE '
        SELECT array_to_json(array_agg(row_to_json(t)))
          FROM (
        SELECT * FROM crosstab(''SELECT date_start, at.at_name,  cda.amount ct 
          FROM report.company_data_date cd 
          JOIN report.company_data_amount cda ON cd.id = cda.company_data_date_id 
          JOIN report.amount_types at ON cda.amount_type_id  = at.id 
         WHERE date_start between $$' || i_start_date::TEXT || '$$ AND $$' || i_end_date::TEXT || '$$ 
           AND interval_type_id = ' || i_interval::TEXT || ' ORDER BY date_start'') 
            AS ct (date_start timestamptz, ' || _text_op || ')
             ) t;';
END;
$ab$ LANGUAGE 'plpgsql';

So, when you run it, you get the dynamic results in JSON, and you don't need to know how many values were pivoted:

select * from report.test(now()- '1 week'::interval, now(), 1);
                                                                                                                     tab                                                                                                                      
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [{"date_start":"2015-07-27T08:40:01.277556-04:00","burn_rate":0.00,"monthly_revenue":5800.00,"cash_balance":0.00},{"date_start":"2015-07-27T08:50:02.458868-04:00","burn_rate":34000.00,"monthly_revenue":15800.00,"cash_balance":24000.00}]
(1 row)

Edit: If you have mixed datatypes in your crosstab, you can add logic to look it up for each column with something like this:

  SELECT a.attname as column_name, format_type(a.atttypid, a.atttypmod) AS data_type 
    FROM pg_attribute a 
    JOIN pg_class b ON (a.attrelid = b.relfilenode) 
    JOIN pg_catalog.pg_namespace n ON n.oid = b.relnamespace 
   WHERE n.nspname = $$schema_name$$ AND b.relname = $$table_name$$ and a.attstattarget = -1;"
Ferreira answered 2/8, 2015 at 23:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.