How to match elements in an array of composite type?
Asked Answered
B

1

2

Let's say we have two tables:

CREATE TABLE element (
    pk1 BIGINT NOT NULL,
    pk2 BIGINT NOT NULL,
    pk3 BIGINT NOT NULL,
    -- other columns ...
    PRIMARY KEY (pk1, pk2, pk3)
);

CREATE TYPE element_pk_t AS (
    pk1 BIGINT,
    pk2 BIGINT,
    pk3 BIGINT
);

CREATE TABLE collection (
    id BIGINT,
    elements element_pk_t[] NOT NULL,
);

The element has a composite PK. The custom type element_pk_t registers a matching composite type. The collection table contains array of element_pk_t.

I want to query all rows from table element where the PK matches an element in a chosen collection.elements, in a single query.

What I've tried:

SELECT * 
FROM element 
WHERE (pk1, pk2, pk3) IN (SELECT unnest(elements) 
                          FROM collection 
                          WHERE id = 1);

I get an error in the IN clause:

ERROR: subquery has too few columns

However, this works:

SELECT * 
FROM element 
WHERE (pk1, pk2, pk3) IN ((1, 2, 3), (4, 5, 6));

So it seems that the problem is how to expand the customized type element_pk_t to 3 columns that can match (pk1, pk2, pk3).

Benzoin answered 15/10, 2021 at 20:33 Comment(3)
Cast your tuple of three BIGINT to a single element_pk_t (the opposite of your suggestion of expanding the array elements).Conjoin
This looks like a really convoluted design to me. You will probably have much less problems in the long run if you create a properly normalized modelHolter
@Conjoin should it be select * from element where (pk1, pk2, pk3)::element_pk_t in (select unnest(elements) from collection where id=1); ? but that gives me a cannot cast type record to element_pk_t[] errorBenzoin
R
5

This works:

SELECT *
FROM   element 
WHERE  (pk1, pk2, pk3) IN (SELECT (unnest(elements)).*
                           FROM   collection
                           WHERE  id = 1);

Or more verbose, but preferable:

SELECT *
FROM   element 
WHERE  (pk1, pk2, pk3) IN (SELECT (e).*
                           FROM   collection c, unnest(c.elements) e
                           WHERE  c.id = 1);

More robust and avoids evaluating unnest() multiple times. See:

This works, too:

SELECT *
FROM   element 
WHERE  ROW((pk1, pk2, pk3)) IN (SELECT unnest(elements)
                                FROM   collection
                                WHERE  id = 1);

The core of the problem is that IN taking a subquery knows two separate forms. Quoting the manual:

expression IN (subquery)

row_constructor IN (subquery)

Your failing query resolves to the second form, while you (understandably) expect the first. But the second form does this:

The left-hand side of this form of IN is a row constructor, as described in Section 4.2.13. The right-hand side is a parenthesized subquery, which must return exactly as many columns as there are expressions in the left-hand row. The left-hand expressions are evaluated and compared row-wise to each row of the subquery result. [...]

My first and second query make it work by decomposing the row type to the right of the operator. So Postgres has three bigint values left and right and is satisfied.

My third query makes it work by nesting the row type to the left in another row constructor. Postgres only decomposes the first level and ends up with a single composite type - matching the single composite type to the right.

Note that the keyword ROW is required for the single field we are wrapping. The manual:

The key word ROW is optional when there is more than one expression in the list.


Your working query is subtly different as it provides a list of values to the right instead of a subquery (set). That's a different implementation taking a different code path. It even gets a separate chapter in the manual. This variant has no special treatment for a ROW constructor to the left. So it just works as expected (by you).

More equivalent (working) syntax variants with = ANY:

SELECT * FROM element 
WHERE (pk1, pk2, pk3) = ANY ('{"(1,2,3)","(2,3,4)"}'::element_pk_t[]);

SELECT * FROM element 
WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3)::element_pk_t,(2,3,4)::element_pk_t]);

SELECT * FROM element 
WHERE (pk1, pk2, pk3) = ANY (ARRAY[(1,2,3),(2,3,4)]::element[]);

Also valid with (pk1, pk2, pk3)::element_pk_t or ROW(pk1, pk2, pk3)::element_pk_t

See:

Since your source is an array, Daniel's second query with (e.pk1, e.pk2, e.pk3) = ANY(c.elements) lends itself naturally.
But for a bet on the fastest query, my money is on my second variant, because I expect it to use the PK index optimally.

Just as proof of concept. Like a_horse commented: a normalized DB design will probably scale best.

Remorseless answered 15/10, 2021 at 21:48 Comment(2)
@Ziqi: If you have big tables with long arrays, I would be interested which variant is fastest for you.Remorseless
my arrays are indeed very long, mostly 10k~30k, all three approaches from yours seems good in my case and all of them are able to leverage the PK index correctly. And I also tried Daniel's second approach with inner join, but that one was super slow (40ms from your approach vs 12000ms from that inner join one)Benzoin

© 2022 - 2024 — McMap. All rights reserved.