How to unpivot a table in PostgreSQL
Asked Answered
O

5

3

I am having difficulties writing a Postgres function, as I am not familiar with it. I have multiple tables to import into Postgres with this format:

id | 1960 | 1961 | 1962 | 1963 | ...
____________________________________
 1    23     45     87     99
 2    12     31    ...

which I need to convert into this format:

id | year | value
_________________
 1   1960    23
 1   1961    45
 1   1962    87
 ...
 2   1960    12
 2   1961    31
 ...

I would imagine the function too to read like this:

SELECT all-years FROM imported_table;
CREATE a new_table;
FROM min-year TO max-year LOOP
     EXECUTE "INSERT INTO new_table (id, year, value) VALUES (id, year, value)";
END LOOP;

However, I'm having real trouble writing the nitty-gritty details for this. Would be easier for me to do that in PHP, but I am convinced that it's cleaner to do it directly in a Postgres-function.

The years (start and end) vary from table to table. And sometimes, I can even have years only for every fifth year or so ...

Ovenbird answered 2/9, 2014 at 14:3 Comment(3)
Version? select version();Discreet
All columns NOT NULL? Also, column names starting with a digit are impossible without double-quoting.Betthezel
Thanks so much for the suggestions! I am running version 9.2.6. Important question: Isn't there any way to read the year columns in a more automated way? For once, there are fifty years which I would need to type then by hand, and secondly, the start and end year can vary from table to table.Ovenbird
B
1

A completely dynamic version requires dynamic SQL. Use a plpgsql function with EXECUTE:

For Postgres 9.2 or older (before LATERAL was implemented):

CREATE OR REPLACE FUNCTION f_unpivot_years92(_tbl regclass, VARIADIC _years int[])
  RETURNS TABLE(id int, year int, value int) AS
$func$
BEGIN
   RETURN QUERY EXECUTE '
   SELECT id
        , unnest($1) AS year
        , unnest(ARRAY["'|| array_to_string(_years, '","') || '"]) AS val
   FROM   ' || _tbl || '
   ORDER  BY 1, 2'
   USING _years;
END
$func$  LANGUAGE plpgsql;

For Postgres 9.3 or later (with LATERAL):

CREATE OR REPLACE FUNCTION f_unpivot_years(_tbl regclass, VARIADIC _years int[])
  RETURNS TABLE(id int, year int, value int) AS
$func$
BEGIN
   RETURN QUERY EXECUTE (SELECT
     'SELECT t.id, u.year, u.val
      FROM  ' || _tbl || ' t
      LEFT   JOIN LATERAL (
         VALUES ' || string_agg(format('(%s, t.%I)', y, y), ', ')
     || ') u(year, val) ON true
      ORDER  BY 1, 2'
      FROM   unnest(_years) y
      );
END
$func$  LANGUAGE plpgsql;

About VARIADIC:

Call for arbitrary years:

SELECT * FROM f_unpivot_years('tbl', 1961, 1964, 1963);

Same, passing an actual array:

SELECT * FROM f_unpivot_years('tbl', VARIADIC '{1960,1961,1962,1963}'::int[]);

For a long list of sequential years:

SELECT * 
FROM f_unpivot_years('t', VARIADIC ARRAY(SELECT generate_series(1950,2014)));

For a long list with regular intervals (example for every 5 years):

SELECT *
FROM f_unpivot_years('t', VARIADIC ARRAY(SELECT generate_series(1950,2010,5)));

Output as requested.

The function takes:
1. A valid table name - double-quoted if it's otherwise illegal (like '"CaMeL"'). Using the object identifier type regclass to assert correctness and defend against SQL injection. You may want to schema-qualify the tale name to be unambiguous (like 'public."CaMeL"'). More:

2. Any list of numbers corresponding to (double-quoted) column names.
Or an actual array, prefixed with the keyword VARIADIC.

The array of columns does not have to be sorted in any way, but table and columns must exist or an exception is raised.

Output is sorted by id and year (as integer). If you want years to be sorted according to the sort order of the input array, make it just ORDER BY 1. Sort order according to array is not strictly guaranteed, but works in the current implementation. More about that:

Also works for NULL values.

SQL Fiddle for both with examples.

References:

Betthezel answered 2/9, 2014 at 15:56 Comment(3)
Thanks a lot for this. That looks great. However, although it runs correctly in the SQLFiddle, it creates an error warning in my database: "syntax error at or near "CREATE"". Not sure if I entered the fields correctly: NAME unpivot; RETURNS numeric; MODE inout; TYPE numeric... or what this is. Do you have any idea?Ovenbird
@luftikus143: RETURNS numeric? Why? What is this? Just copy / paste my code and it should work.Betthezel
Ah, got it now. Sorry! Works perfectly! Thanks so much!Ovenbird
E
4

PostgreSQL 9.3 offers as neat JSON functions which can be used for such tasks without defining new functions or knowing a number of columns.

SELECT id, (k).key as year, (k).value as value FROM
  (SELECT j->>'id' as id, json_each_text(j) as k
    FROM (
       SELECT row_to_json(tbl) as j FROM tbl) 
    as q)
    as r
WHERE (k).key <> 'id';

http://sqlfiddle.com/#!15/1714b/13

Excaudate answered 11/9, 2014 at 20:23 Comment(1)
This is briljant: I have "periods" which need to be unpivoted. For some customers quarters (i.e. 4 periods), sometimes weeks (i.e. 52 or 53 weeks). And they need to go into 1 DWH. This will do the trick.Misinterpret
D
2

Parallel unnesting might be easier

select
    id,
    unnest(array[1960, 1961, 1962]) as year,
    unnest(array["1960", "1961", "1962"]) as value
from (values
    (1,23,45,87), (2,12,31,53)
) s(id, "1960", "1961", "1962")
;
 id | year | value 
----+------+-------
  1 | 1960 |    23
  1 | 1961 |    45
  1 | 1962 |    87
  2 | 1960 |    12
  2 | 1961 |    31
  2 | 1962 |    53
Discreet answered 2/9, 2014 at 14:31 Comment(0)
B
1

A completely dynamic version requires dynamic SQL. Use a plpgsql function with EXECUTE:

For Postgres 9.2 or older (before LATERAL was implemented):

CREATE OR REPLACE FUNCTION f_unpivot_years92(_tbl regclass, VARIADIC _years int[])
  RETURNS TABLE(id int, year int, value int) AS
$func$
BEGIN
   RETURN QUERY EXECUTE '
   SELECT id
        , unnest($1) AS year
        , unnest(ARRAY["'|| array_to_string(_years, '","') || '"]) AS val
   FROM   ' || _tbl || '
   ORDER  BY 1, 2'
   USING _years;
END
$func$  LANGUAGE plpgsql;

For Postgres 9.3 or later (with LATERAL):

CREATE OR REPLACE FUNCTION f_unpivot_years(_tbl regclass, VARIADIC _years int[])
  RETURNS TABLE(id int, year int, value int) AS
$func$
BEGIN
   RETURN QUERY EXECUTE (SELECT
     'SELECT t.id, u.year, u.val
      FROM  ' || _tbl || ' t
      LEFT   JOIN LATERAL (
         VALUES ' || string_agg(format('(%s, t.%I)', y, y), ', ')
     || ') u(year, val) ON true
      ORDER  BY 1, 2'
      FROM   unnest(_years) y
      );
END
$func$  LANGUAGE plpgsql;

About VARIADIC:

Call for arbitrary years:

SELECT * FROM f_unpivot_years('tbl', 1961, 1964, 1963);

Same, passing an actual array:

SELECT * FROM f_unpivot_years('tbl', VARIADIC '{1960,1961,1962,1963}'::int[]);

For a long list of sequential years:

SELECT * 
FROM f_unpivot_years('t', VARIADIC ARRAY(SELECT generate_series(1950,2014)));

For a long list with regular intervals (example for every 5 years):

SELECT *
FROM f_unpivot_years('t', VARIADIC ARRAY(SELECT generate_series(1950,2010,5)));

Output as requested.

The function takes:
1. A valid table name - double-quoted if it's otherwise illegal (like '"CaMeL"'). Using the object identifier type regclass to assert correctness and defend against SQL injection. You may want to schema-qualify the tale name to be unambiguous (like 'public."CaMeL"'). More:

2. Any list of numbers corresponding to (double-quoted) column names.
Or an actual array, prefixed with the keyword VARIADIC.

The array of columns does not have to be sorted in any way, but table and columns must exist or an exception is raised.

Output is sorted by id and year (as integer). If you want years to be sorted according to the sort order of the input array, make it just ORDER BY 1. Sort order according to array is not strictly guaranteed, but works in the current implementation. More about that:

Also works for NULL values.

SQL Fiddle for both with examples.

References:

Betthezel answered 2/9, 2014 at 15:56 Comment(3)
Thanks a lot for this. That looks great. However, although it runs correctly in the SQLFiddle, it creates an error warning in my database: "syntax error at or near "CREATE"". Not sure if I entered the fields correctly: NAME unpivot; RETURNS numeric; MODE inout; TYPE numeric... or what this is. Do you have any idea?Ovenbird
@luftikus143: RETURNS numeric? Why? What is this? Just copy / paste my code and it should work.Betthezel
Ah, got it now. Sorry! Works perfectly! Thanks so much!Ovenbird
F
0

The simplest way is a union all:

select id, 1960 as year, "1960" as value
from table t
union all
select id, '1960', "1961"
from table t
. . .;

A somewhat more sophisticated way would be:

select t.id, s.yr,
       (case when s.yr = 1960 then "1960"
             when s.yr = 1961 then "1961"
             . . .
        end) as value
from table t cross join
     generate_series(1960, 1980) s(yr);

You can put this into another table using insert or create table as.

Floorman answered 2/9, 2014 at 14:7 Comment(0)
A
0

Here's a way that's based on the method at https://blog.sql-workbench.eu/post/dynamic-unpivot/ and another answer to this question, but uses the hstore extension

SELECT
  id,
  r.key AS year,
  r.value AS value
FROM
  imported_table t
CROSS JOIN
  each(hstore(t.*)) AS r(key, value)
WHERE
  -- This chooses columns that look like years
  -- In other cases you might need a different condition
  r.key ~ '^[0-9]{4}$'

It has a few benefits over other solutions:

  • By using hstore and not jsonb, it hopefully minimises issues with type conversions (although hstore does convert everything to text)
  • The columns don't need to be hard coded or known in advance. Here, columns are chosen by a regex on the name, but you could use any SQL logic based on the name (or even the value)
  • It doesn't require PL/pgSQL - it's all SQL
Archimage answered 6/12, 2022 at 20:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.