Parallel unnest() and sort order in PostgreSQL
Asked Answered
A

2

9

I understand that using

SELECT unnest(ARRAY[5,3,9]) as id

without an ORDER BY clause, the order of the result set is not guaranteed. I could for example get:

id
--
3
5
9

But what about the following request:

SELECT
  unnest(ARRAY[5,3,9]) as id,
  unnest(ARRAY(select generate_series(1, array_length(ARRAY[5,3,9], 1)))) as idx
ORDER BY idx ASC

Is it guaranteed that the 2 unnest() calls (which have the same length) will unroll in parallel and that the index idx will indeed match the position of the item in the array?

I am using PostgreSQL 9.3.3.

Allene answered 23/5, 2014 at 13:48 Comment(0)
B
14

Yes, that is a feature of Postgres and parallel unnesting is guaranteed to be in sync (as long as all arrays have the same number of elements).
Postgres 9.4 adds a clean solution for parallel unnest:

The order of resulting rows is not guaranteed, though. Actually, with a statement as simple as:

SELECT unnest(ARRAY[5,3,9]) AS id;

the resulting order of rows is "guaranteed", but Postgres does not assert anything. The query optimizer is free to order rows as it sees fit as long as the order is not explicitly defined. This may have side effects in more complex queries.

If the second query in your question is what you actually want (add an index number to unnested array elements), there is a better way with generate_subscripts():

SELECT unnest(ARRAY[5,3,9]) AS id
     , generate_subscripts(ARRAY[5,3,9], 1) AS idx
ORDER  BY idx;

Details in this related answer:

You will be interested in WITH ORDINALITY in Postgres 9.4:

Then you can use:

SELECT * FROM unnest(ARRAY[5,3,9]) WITH ORDINALITY tbl(id, idx);
Banff answered 23/5, 2014 at 20:50 Comment(0)
T
2

Short answer: No, idx will not match the array positions, when accepting the premise that unnest() output may be randomly ordered.

Demo: since the current implementation of unnest actually output the rows in the order of elements, I suggest to add a layer on top of it to simulate a random order:

CREATE FUNCTION unnest_random(anyarray)  RETURNS setof anyelement
language sql as
$$ select unnest($1) order by random() $$;

Then check out a few executions of your query with unnest replaced by unnest_random:

SELECT
  unnest_random(ARRAY[5,3,9]) as id,
  unnest_random(ARRAY(select generate_series(1, array_length(ARRAY[5,3,9], 1)))) as idx
ORDER BY idx ASC

Example of output:

 id | idx 
----+-----
  3 |   1
  9 |   2
  5 |   3

id=3 is associated with idx=1 but 3 was in 2nd position in the array. It's all wrong.

What's wrong in the query: it assumes that the first unnest will shuffle the elements using the same permutation as the second unnest (permutation in the mathematic sense: the relationship between order in the array and order of the rows). But this assumption contradicts the premise that the order output of unnest is unpredictable to start with.

About this question:

Is it guaranteed that the 2 unnest() calls (which have the same length) will unroll in parallel

In select unnest(...) X1, unnest(...) X2, with X1 and X2 being of type SETOF something and having the same number of rows, X1 and X2 will be paired in the final output so that the X1 value at row N will face the X2 value at the same row N. (it's a kind of UNION for columns, as opposed to a cartesian product).

But I wouldn't describe this pairing as unroll in parallel, so I'm not sure this is what you meant.

Anyway this pairing doesn't help with the problem since it happens after the unnest calls have lost the array positions.

An alternative: In this thread from the pgsql-sql mailing list, this function is suggested:

CREATE OR REPLACE FUNCTION unnest_with_ordinality(anyarray, OUT value
anyelement, OUT ordinality integer)
  RETURNS SETOF record AS
$$
SELECT $1[i], i FROM
    generate_series(array_lower($1,1),
                    array_upper($1,1)) i;
$$
LANGUAGE sql IMMUTABLE; 

Based on this, we can order by the second output column:

select * from unnest_with_ordinality(array[5,3,9]) order by 2;
 value | ordinality 
-------+------------
     5 |          1
     3 |          2
     9 |          3

With postgres 9.4 and above: The WITH ORDINALITY clause that can follow SET RETURNING function calls will provide this functionality in a generic way.

Teaching answered 24/5, 2014 at 19:24 Comment(3)
Your example is a red herring. It doesn't proof what you claim it would. Your function unnest_random() orders rows by random() before pairing with the second SRF. If you unnest two arrays in parallel, the sort order is pre-determined and pairing is guaranteed. You are accepting the premise that unnest() output may be randomly ordered., but that's just not the case in the current implementation. The correct answer to the question is "Yes".Banff
That premise comes from the question. To me it's asking if the proposed query with its pairing technique insulates from a potentially non-ordered output from unnest. So no, it doesn't. That said, the current implementation of unnest follows the array order. The trouble is that it's undocumented. Personally I don't have a problem with that, but if the OP does, I can't blame him for that.Intraatomic
As I understand it 1/ that the current impl of unnest does the unnesting in the array order (not random); 2/ that for parallel unnest, the pairing is guaranteed to be done in sequence on the base of the unnested arrays; that I may get unordered tuples because of the optimizer. So you are both right that I should switch to 9.4 ;-) but Erwin's answer is the confirmation that I was looking for regarding the pairing (my non-ordered point was coming from things I had read and mis-understood at the time about the optimizer stage)Allene

© 2022 - 2024 — McMap. All rights reserved.