Select a dynamic set of columns from a table and get the sum for each

Is it possible to do the following in Postgres:

SELECT column_name FROM information_schema WHERE table_name = 'somereport' AND data_type = 'integer';

SELECT SUM(coulmn_name[0]),SUM(coulmn_name[1]) ,SUM(coulmn_name[3]) FROM somereport;

In other words I need to select a group of columns from a table depending on certain criteria, and then sum each of those columns in the table.

I know I can do this in a loop, so I can count each column independently, but obviously that requires a query for each column returned from the information schema query. Eg:

FOR r IN select column_name from information_schema where report_view_name = 'somereport' and data_type = 'integer';
LOOP
    SELECT SUM(r.column_name) FROM somereport;
END

This query creates the complete DML statement you are after:

WITH x AS (
   SELECT 'public'::text     AS _schema  -- provide schema name ..
         ,'somereport'::text AS _tbl     -- .. and table name once
   )
SELECT 'SELECT ' || string_agg('sum(' || quote_ident(column_name)
                 || ') AS sum_' || quote_ident(column_name), ', ')
       || E'\nFROM   ' || quote_ident(x._schema) || '.' || quote_ident(x._tbl)
FROM   x, information_schema.columns
WHERE  table_schema = _schema
AND    table_name = _tbl
AND    data_type = 'integer'
GROUP  BY x._schema, x._tbl;

You can execute it separately or wrap this query in a plpgsql function and run the query automatically with EXECUTE:

Full automation

Tested with PostgreSQL 9.1.4

CREATE OR REPLACE FUNCTION f_get_sums(_schema text, _tbl text)
  RETURNS TABLE(names text[], sums bigint[]) AS
$BODY$
BEGIN

RETURN QUERY EXECUTE (
    SELECT 'SELECT ''{'
           || string_agg(quote_ident(c.column_name), ', ' ORDER BY c.column_name)
           || '}''::text[],
           ARRAY['
           || string_agg('sum(' || quote_ident(c.column_name) || ')'
                                                   , ', ' ORDER BY c.column_name)
           || ']
    FROM   '
           || quote_ident(_schema) || '.' || quote_ident(_tbl)
    FROM   information_schema.columns c
    WHERE  table_schema = _schema
    AND    table_name = _tbl
    AND    data_type = 'integer'
    );

END;
$BODY$
  LANGUAGE plpgsql;

Call:

SELECT unnest(names) AS name, unnest (sums) AS col_sum
FROM   f_get_sums('public', 'somereport');

Returns:

   name        | col_sum
---------------+---------
 int_col1      |    6614
 other_int_col |    8364
 third_int_col | 2720642

Explain

The difficulty is to define the RETURN type for the function, while number and names of columns returned will vary. One detail that helps a little: you only want integer columns.

I solved this by forming an array of bigint (sum(int_col) returns bigint). In addition I return an array of column names. Both sorted alphabetically by column name.

In the function call I split up these arrays with unnest() arriving at the handsome format displayed.

The dynamically created and executed query is advanced stuff. Don't get confused by multiple layers of quotes. Basically you have EXECUTE that takes a text argument containing the SQL query to execute. This text, in turn, is provided by secondary SQL query that builds the query string of the primary query.

If this is too much at once or plpgsql is rather new for you, start with this related answer where I explain the basics dealing with a much simpler function and provide links to the manual for the major features.

If performance is essential query the Postgres catalog directly (pg_catalog.pg_attributes) instead of using the standardized (but slow) information_schema.columns. Here is a simple example with pg_attributes.

Full automation

Explain

Recommended topics

Hot tags