PostgreSQL unnest() with element number
Asked Answered
I

6

141

When I have a column with separated values, I can use the unnest() function:

myTable
id | elements
---+------------
1  |ab,cd,efg,hi
2  |jk,lm,no,pq
3  |rstuv,wxyz

select id, unnest(string_to_array(elements, ',')) AS elem
from myTable

id | elem
---+-----
1  | ab
1  | cd
1  | efg
1  | hi
2  | jk
...

How can I include element numbers? I.e.:

id | elem | nr
---+------+---
1  | ab   | 1
1  | cd   | 2
1  | efg  | 3
1  | hi   | 4
2  | jk   | 1
...

I want the original position of each element in the source string. I've tried with window functions (row_number(), rank() etc.) but I always get 1. Maybe because they are in the same row of the source table?

I know it's a bad table design. It's not mine, I'm just trying to fix it.

Ignacia answered 6/1, 2012 at 15:45 Comment(0)
P
271

Postgres 14 or later

Use string_to_table() instead of unnest(string_to_array()) for a comma-separated string:

SELECT t.id, a.elem, a.nr
FROM   tbl t
LEFT   JOIN LATERAL string_to_table(t.elements, ',')
                    WITH ORDINALITY AS a(elem, nr) ON true;

fiddle

Related:

Unnesting an actual array didn't change since Postgres 9.4.

Postgres 9.4 or later

Use WITH ORDINALITY for set-returning functions:

When a function in the FROM clause is suffixed by WITH ORDINALITY, a bigint column is appended to the output which starts from 1 and increments by 1 for each row of the function's output. This is most useful in the case of set returning functions such as unnest().

In combination with the LATERAL feature in pg 9.3+, and according to this thread on pgsql-hackers, the above query can now be written as:

SELECT t.id, a.elem, a.nr
FROM   tbl AS t
LEFT   JOIN LATERAL unnest(string_to_array(t.elements, ','))
                    WITH ORDINALITY AS a(elem, nr) ON true;

LEFT JOIN ... ON true preserves all rows from the left table, even if the table expression to the right returns no rows. See:

If that's of no concern you can use the otherwise equivalent, less verbose form with an implicit CROSS JOIN LATERAL:

SELECT t.id, a.elem, a.nr
FROM   tbl t, unnest(string_to_array(t.elements, ',')) WITH ORDINALITY a(elem, nr);

Or simpler, based off an actual array (arr being an array column):

SELECT t.id, a.elem, a.nr
FROM   tbl t, unnest(t.arr) WITH ORDINALITY a(elem, nr);

Or just go with default column names:

SELECT id, a, ordinality
FROM   tbl, unnest(arr) WITH ORDINALITY a;

Or shorter, yet:

SELECT id, a.* FROM tbl, unnest(arr) WITH ORDINALITY a;

Or minimal syntax:

SELECT * FROM tbl, unnest(arr) WITH ORDINALITY a;

The last one returns all columns of tbl, of course.

a is automatically table and column alias (for the first column). The default name of the added ordinality column is ordinality. But it's clearer to add explicit column aliases and table-qualify columns.

The original order of array elements is preserved this way. The manual for unnest():

Expands an array into a set of rows. The array's elements are read out in storage order.

Postgres 8.4 - 9.3

With row_number() OVER (PARTITION BY id ORDER BY elem) you get numbers according to the sort order, not the ordinal number of the original ordinal position in the string.

You can simply omit ORDER BY:

SELECT *, row_number() OVER (PARTITION by id) AS nr
FROM  (SELECT id, regexp_split_to_table(elements, ',') AS elem FROM tbl) t;

While this normally works and I have never seen it fail in simple queries, PostgreSQL asserts nothing concerning the order of rows without ORDER BY. It happens to work due to an implementation detail.

To guarantee ordinal numbers of elements in the blank-separated string:

SELECT id, arr[nr] AS elem, nr
FROM  (
   SELECT *, generate_subscripts(arr, 1) AS nr
   FROM  (SELECT id, string_to_array(elements, ' ') AS arr FROM tbl) t
   ) sub;

Or simpler if based off an actual array:

SELECT id, arr[nr] AS elem, nr
FROM  (SELECT *, generate_subscripts(arr, 1) AS nr FROM tbl) t;

Related answer on dba.SE:

Postgres 8.1 - 8.4

None of these features are available, yet: RETURNS TABLE, generate_subscripts(), unnest(), array_length(). But this works:

CREATE FUNCTION f_unnest_ord(anyarray, OUT val anyelement, OUT ordinality integer)
  RETURNS SETOF record
  LANGUAGE sql IMMUTABLE AS
'SELECT $1[i], i - array_lower($1,1) + 1
 FROM   generate_series(array_lower($1,1), array_upper($1,1)) i';

Note in particular, that the array index can differ from ordinal positions of elements. Consider this demo with an extended function:

CREATE FUNCTION f_unnest_ord_idx(anyarray, OUT val anyelement, OUT ordinality int, OUT idx int)
  RETURNS SETOF record
  LANGUAGE sql IMMUTABLE AS
'SELECT $1[i], i - array_lower($1,1) + 1, i
 FROM   generate_series(array_lower($1,1), array_upper($1,1)) i';

SELECT id, arr, (rec).*
FROM  (
   SELECT *, f_unnest_ord_idx(arr) AS rec
   FROM  (
      VALUES
        (1, '{a,b,c}'::text[])  --  short for: '[1:3]={a,b,c}'
      , (2, '[5:7]={a,b,c}')
      , (3, '[-9:-7]={a,b,c}')
      ) t(id, arr)
   ) sub;

 id |       arr       | val | ordinality | idx
----+-----------------+-----+------------+-----
  1 | {a,b,c}         | a   |          1 |   1
  1 | {a,b,c}         | b   |          2 |   2
  1 | {a,b,c}         | c   |          3 |   3
  2 | [5:7]={a,b,c}   | a   |          1 |   5
  2 | [5:7]={a,b,c}   | b   |          2 |   6
  2 | [5:7]={a,b,c}   | c   |          3 |   7
  3 | [-9:-7]={a,b,c} | a   |          1 |  -9
  3 | [-9:-7]={a,b,c} | b   |          2 |  -8
  3 | [-9:-7]={a,b,c} | c   |          3 |  -7

Compare:

Plash answered 7/1, 2012 at 4:49 Comment(12)
This answer is one of the most comprehensive answers in SO, regarding PostgreSQL. Thanks Erwin.Blackett
Can we adapt unnest2 function below to a real table return (not fake rows), in the new pg versions?Lunde
@erwin-brandstetter, would you please elaborate on why/if WITH ORDINALITY is preferred over generate_subscripts()? It looks to me like generate_subscripts() is better as it shows the actual element location in the array. This is useful, for example, when updating the array... should I be using WITH ORDINALITY instead?Decidua
@losthorse: I would outline it like this: WITH ORDINALITY is the general solution to get row numbers for any set returning function in an SQL query. It's the fastest, reliable way and it also happens to work perfectly for 1-dimenstional, 1-based arrays (the default for Postgres arrays, consider this). If you work with any other kind of arrays (most people don't), and you actually need to preserve / work with the original subscripts, then generate_subscripts() is the way to go. But unnest() flattens everytihng to begin with ...Plash
What does LATERAL do here? I have a similar query (LEFT JOIN unnest(foo) WITH ORDINALITY ON ...) that gives the same results whether or not LATERAL is given.Leif
Oh wait, mine is different because the JOIN unnest(foo) is inside the subquery, referring to an existing array column.Leif
@z0r_ The manual: Table functions appearing in FROM can also be preceded by the key word LATERAL, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding FROM items in any case.Plash
@Erwin Brandstetter: So for 1-dimensional, 1-based arrays, the WITH ORDINALITY approach happens to work fine since it's the same as how Postgres internal works for same type of array? Is my understanding correct? I'm asking because I read somewhere in the documents there is no guarantee for the order of the sequence of the rows generated. generate_subscripts is the only way I can understand.Sawicki
@LiweiZ: To get the original position unnest(...) WITH ORDINALITY works fine for any array. You may be thinking of the original array subscript, which is not what this question asks for. If anything is still unclear, please post a question. You can always link to this answer for context and drop a comment linking to your related question.Plash
@ErwinBrandstetter Where does it say WITH ORDINALITY will give you the original position of the array elements? unnest is free to output array elements in any order right? The ordinality is the ordering of the output result set, not the array.Vigilance
@richb: unnest() returns elements in storage order. I added quote and link to the manual.Plash
@ErwinBrandstetter Thanks. The meaning of "storage order" wasn't immediately obvious to me but I suppose we can interpret that as ordered by index for a one dimensional array.Vigilance
F
9

Try:

select v.*, row_number() over (partition by id order by elem) rn from
(select
    id,
    unnest(string_to_array(elements, ',')) AS elem
 from myTable) v
Forcible answered 6/1, 2012 at 15:50 Comment(1)
Doesn't this require that your elements are lexicographically ordered? I know that they are in the question, but it is a pretty specific solution to something which the community probably needs a general solution for most of the time.Mossy
C
7

Use Subscript Generating Functions.
http://www.postgresql.org/docs/current/static/functions-srf.html#FUNCTIONS-SRF-SUBSCRIPTS

For example:

SELECT 
  id
  , elements[i] AS elem
  , i AS nr
FROM
  ( SELECT 
      id
      , elements
      , generate_subscripts(elements, 1) AS i
    FROM
      ( SELECT
          id
          , string_to_array(elements, ',') AS elements
        FROM
          myTable
      ) AS foo
  ) bar
;

More simply:

SELECT
  id
  , unnest(elements) AS elem
  , generate_subscripts(elements, 1) AS nr
FROM
  ( SELECT
      id
      , string_to_array(elements, ',') AS elements
    FROM
      myTable
  ) AS foo
;
Cylinder answered 27/12, 2012 at 11:57 Comment(0)
V
3

If the order of element is not important, you can

select 
  id, elem, row_number() over (partition by id) as nr
from (
  select
      id,
      unnest(string_to_array(elements, ',')) AS elem
  from myTable
) a
Valentinevalentino answered 6/1, 2012 at 15:51 Comment(0)
H
2

I think this is related, using a correlated subquery to assign arbitrary ranked / ordinal values to the final set. It's more of a practical applied use using PG array handling to De-Pivot a dataset (works w/ PG 9.4).

WITH _students AS ( /** CTE **/
                  SELECT * FROM
                    (   SELECT 'jane'::TEXT ,'doe'::TEXT , 1::INT 
                         UNION
                        SELECT 'john'::TEXT ,'doe'::TEXT , 2::INT 
                         UNION
                        SELECT 'jerry'::TEXT ,'roe'::TEXT , 3::INT 
                         UNION
                        SELECT 'jodi'::TEXT ,'roe'::TEXT , 4::INT 
                    ) s ( fn, ln, id )
) /** end WITH **/   
SELECT s.id
 , ax.fanm
 , ax.anm
 , ax.val
 , ax.num
FROM _students s
,UNNEST /** MULTI-UNNEST() BLOCK **/
    (
        ( SELECT ARRAY[ fn, ln ]::text[] AS anm 
                  /** CORRELATED SUBQUERY **/
                 FROM _students s2 WHERE s2.id = s.id 
         )   
   
        ,( SELECT ARRAY[ 'first name', 'last name' ]::text[] AS fanm )  
     
        ,( SELECT ARRAY[ '9','8','7'] AS val) 
   
        ,( SELECT ARRAY[ 1,2,3,4,5   ] AS num) 
        
   ) ax (  anm, fanm, val, num )
;   

DE-PIVOTED RESULT SET:

+--+----------+-----+----+---+
|id|fanm      |anm  |val |num|
+--+----------+-----+----+---+
|2 |first name|john |9   |1  |
|2 |last name |doe  |8   |2  |
|2 |NULL      |NULL |7   |3  |
|2 |NULL      |NULL |NULL|4  |
|2 |NULL      |NULL |NULL|5  |
|1 |first name|jane |9   |1  |
|1 |last name |doe  |8   |2  |
|1 |NULL      |NULL |7   |3  |
|1 |NULL      |NULL |NULL|4  |
|1 |NULL      |NULL |NULL|5  |
|4 |first name|jodi |9   |1  |
|4 |last name |roe  |8   |2  |
|4 |NULL      |NULL |7   |3  |
|4 |NULL      |NULL |NULL|4  |
|4 |NULL      |NULL |NULL|5  |
|3 |first name|jerry|9   |1  |
|3 |last name |roe  |8   |2  |
|3 |NULL      |NULL |7   |3  |
|3 |NULL      |NULL |NULL|4  |
|3 |NULL      |NULL |NULL|5  |
+--+----------+-----+----+---+
Hower answered 13/5, 2021 at 16:39 Comment(0)
L
0

unnest2() as exercise

Older versions before pg v8.4 need a user-defined unnest(). We can adapt this old function to return elements with an index:

CREATE FUNCTION unnest2(anyarray)
  RETURNS setof record  AS
$BODY$
  SELECT $1[i], i
  FROM   generate_series(array_lower($1,1),
                         array_upper($1,1)) i;
$BODY$ LANGUAGE sql IMMUTABLE;
Lunde answered 6/1, 2012 at 15:45 Comment(3)
This would not work before pg v8.4, because there is not RETURNS TABLE, yet. I added a chapter to my answer discussing a solution.Plash
@ErwinBrandstetter , your answers are very didactic, and you are polishing a text of 4 years ago (!)... Do you is writing a PostgreSQL book using your SO texts? :-)Lunde
Hi all, it is a Wiki, you can edit (!)... But ok, I corrected to setof record.Lunde

© 2022 - 2024 — McMap. All rights reserved.