PostgreSQL - max number of parameters in "IN" clause?
Asked Answered
F

8

218

In Postgres, you can specify an IN clause, like this:

SELECT * FROM user WHERE id IN (1000, 1001, 1002)

Does anyone know what's the maximum number of parameters you can pass into IN?

Frederic answered 17/6, 2009 at 21:48 Comment(0)
F
115

This is not really an answer to the present question, however it might help others too.

At least I can tell there is a technical limit of 32767 values (=Short.MAX_VALUE) passable to the PostgreSQL backend, using Posgresql's JDBC driver 9.1.

This is a test of "delete from x where id in (... 100k values...)" with the postgresql jdbc driver:

Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 100000
    at org.postgresql.core.PGStream.SendInteger2(PGStream.java:201)
Film answered 15/2, 2017 at 13:54 Comment(1)
The OP has asked about DB engine limitation, but searching for JDBC limitation I've came here and that's what I was lloking for. So there's a limitation, however, quite high.Phonon
W
109

According to the source code located here, starting at line 850, PostgreSQL doesn't explicitly limit the number of arguments.

The following is a code comment from line 870:

/*
 * We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
 * possible if the inputs are all scalars (no RowExprs) and there is a
 * suitable array type available.  If not, we fall back to a boolean
 * condition tree with multiple copies of the lefthand expression.
 * Also, any IN-list items that contain Vars are handled as separate
 * boolean conditions, because that gives the planner more scope for
 * optimization on such clauses.
 *
 * First step: transform all the inputs, and detect whether any are
 * RowExprs or contain Vars.
 */
Wanwand answered 18/6, 2009 at 0:19 Comment(2)
I don't even know what version was the current release from 2009, but as of version 15, this is where the code is located: git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/…Wanwand
what do you think about query parameters limit = 65 535 (source) -- isn't it the actual limit?Looseleaf
S
44
explain select * from test where id in (values (1), (2));

QUERY PLAN

 Seq Scan on test  (cost=0.00..1.38 rows=2 width=208)
   Filter: (id = ANY ('{1,2}'::bigint[]))

But if try 2nd query:

explain select * from test where id = any (values (1), (2));

QUERY PLAN

Hash Semi Join  (cost=0.05..1.45 rows=2 width=208)
       Hash Cond: (test.id = "*VALUES*".column1)
       ->  Seq Scan on test  (cost=0.00..1.30 rows=30 width=208)
       ->  Hash  (cost=0.03..0.03 rows=2 width=4)
             ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)

We can see that postgres build temp table and join with it

Shavian answered 11/3, 2015 at 19:21 Comment(1)
But what i heard that postgres-9.3+ both seems to be same performant. datadoghq.com/blog/…Horatia
D
28

As someone more experienced with Oracle DB, I was concerned about this limit too. I carried out a performance test for a query with ~10'000 parameters in an IN-list, fetching prime numbers up to 100'000 from a table with the first 100'000 integers by actually listing all the prime numbers as query parameters.

My results indicate that you need not worry about overloading the query plan optimizer or getting plans without index usage, since it will transform the query to use = ANY({...}::integer[]) where it can leverage indices as expected:

-- prepare statement, runs instantaneous:
PREPARE hugeplan (integer, integer, integer, ...) AS
SELECT *
FROM primes
WHERE n IN ($1, $2, $3, ..., $9592);

-- fetch the prime numbers:
EXECUTE hugeplan(2, 3, 5, ..., 99991);

-- EXPLAIN ANALYZE output for the EXECUTE:
"Index Scan using n_idx on primes  (cost=0.42..9750.77 rows=9592 width=5) (actual time=0.024..15.268 rows=9592 loops=1)"
"  Index Cond: (n = ANY ('{2,3,5,7, (...)"
"Execution time: 16.063 ms"

-- setup, should you care:
CREATE TABLE public.primes
(
  n integer NOT NULL,
  prime boolean,
  CONSTRAINT n_idx PRIMARY KEY (n)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE public.primes
  OWNER TO postgres;

INSERT INTO public.primes
SELECT generate_series(1,100000);

However, this (rather old) thread on the pgsql-hackers mailing list indicates that there is still a non-negligible cost in planning such queries, so take my word with a grain of salt.

Diffract answered 9/12, 2017 at 14:50 Comment(0)
H
22

There is no limit to the number of elements that you are passing to IN clause. If there are more elements it will consider it as array and then for each scan in the database it will check if it is contained in the array or not. This approach is not so scalable. Instead of using IN clause try using INNER JOIN with temp table. Refer http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/ for more info. Using INNER JOIN scales well as query optimizer can make use of hash join and other optimization. Whereas with IN clause there is no way for the optimizer to optimize the query. I have noticed speedup of at least 2x with this change.

Heterochromosome answered 26/10, 2012 at 6:55 Comment(1)
The link you're referring to doesn't say what DBMS it is talking about. While I can confirm that on Oracle DB, using temporary tables gives a massive performance boost over using queries combining OR and IN clauses due to the large overhead in parsing and planning such queries, I could not confirm the problem with Postgres 9.5, see this answer.Diffract
G
15

Just tried it. the answer is -> out-of-range integer as a 2-byte value: 32768

Germann answered 3/2, 2020 at 22:38 Comment(0)
U
0

You might want to consider refactoring that query instead of adding an arbitrarily long list of ids... You could use a range if the ids indeed follow the pattern in your example:

SELECT * FROM user WHERE id >= minValue AND id <= maxValue;

Another option is to add an inner select:

SELECT * 
FROM user 
WHERE id IN (
    SELECT userId
    FROM ForumThreads ft
    WHERE ft.id = X
);
Urge answered 17/6, 2009 at 21:57 Comment(0)
S
0

If you have query like:

SELECT * FROM user WHERE id IN (1, 2, 3, 4 -- and thousands of another keys)

you may increase performace if rewrite your query like:

SELECT * FROM user WHERE id = ANY(VALUES (1), (2), (3), (4) -- and thousands of another keys)
Shavian answered 27/10, 2014 at 13:29 Comment(2)
PostgreSQL's EXPLAIN says it's internally rewriting my IN (...) as ANY ('{...}'::integer[]).Charlottcharlotta
Anyway, @KiranJonnalagadda, it increases performance (negligible, perhaps) if no internal work is needed.Fridlund

© 2022 - 2024 — McMap. All rights reserved.