In Postgres, you can specify an IN clause, like this:
SELECT * FROM user WHERE id IN (1000, 1001, 1002)
Does anyone know what's the maximum number of parameters you can pass into IN?
In Postgres, you can specify an IN clause, like this:
SELECT * FROM user WHERE id IN (1000, 1001, 1002)
Does anyone know what's the maximum number of parameters you can pass into IN?
This is not really an answer to the present question, however it might help others too.
At least I can tell there is a technical limit of 32767 values (=Short.MAX_VALUE) passable to the PostgreSQL backend, using Posgresql's JDBC driver 9.1.
This is a test of "delete from x where id in (... 100k values...)" with the postgresql jdbc driver:
Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 100000
at org.postgresql.core.PGStream.SendInteger2(PGStream.java:201)
According to the source code located here, starting at line 850, PostgreSQL doesn't explicitly limit the number of arguments.
The following is a code comment from line 870:
/*
* We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
* possible if the inputs are all scalars (no RowExprs) and there is a
* suitable array type available. If not, we fall back to a boolean
* condition tree with multiple copies of the lefthand expression.
* Also, any IN-list items that contain Vars are handled as separate
* boolean conditions, because that gives the planner more scope for
* optimization on such clauses.
*
* First step: transform all the inputs, and detect whether any are
* RowExprs or contain Vars.
*/
65 535
(source) -- isn't it the actual limit? –
Looseleaf explain select * from test where id in (values (1), (2));
Seq Scan on test (cost=0.00..1.38 rows=2 width=208)
Filter: (id = ANY ('{1,2}'::bigint[]))
But if try 2nd query:
explain select * from test where id = any (values (1), (2));
Hash Semi Join (cost=0.05..1.45 rows=2 width=208)
Hash Cond: (test.id = "*VALUES*".column1)
-> Seq Scan on test (cost=0.00..1.30 rows=30 width=208)
-> Hash (cost=0.03..0.03 rows=2 width=4)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4)
We can see that postgres build temp table and join with it
As someone more experienced with Oracle DB, I was concerned about this limit too. I carried out a performance test for a query with ~10'000 parameters in an IN
-list, fetching prime numbers up to 100'000 from a table with the first 100'000 integers by actually listing all the prime numbers as query parameters.
My results indicate that you need not worry about overloading the query plan optimizer or getting plans without index usage, since it will transform the query to use = ANY({...}::integer[])
where it can leverage indices as expected:
-- prepare statement, runs instantaneous:
PREPARE hugeplan (integer, integer, integer, ...) AS
SELECT *
FROM primes
WHERE n IN ($1, $2, $3, ..., $9592);
-- fetch the prime numbers:
EXECUTE hugeplan(2, 3, 5, ..., 99991);
-- EXPLAIN ANALYZE output for the EXECUTE:
"Index Scan using n_idx on primes (cost=0.42..9750.77 rows=9592 width=5) (actual time=0.024..15.268 rows=9592 loops=1)"
" Index Cond: (n = ANY ('{2,3,5,7, (...)"
"Execution time: 16.063 ms"
-- setup, should you care:
CREATE TABLE public.primes
(
n integer NOT NULL,
prime boolean,
CONSTRAINT n_idx PRIMARY KEY (n)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.primes
OWNER TO postgres;
INSERT INTO public.primes
SELECT generate_series(1,100000);
However, this (rather old) thread on the pgsql-hackers mailing list indicates that there is still a non-negligible cost in planning such queries, so take my word with a grain of salt.
There is no limit to the number of elements that you are passing to IN clause. If there are more elements it will consider it as array and then for each scan in the database it will check if it is contained in the array or not. This approach is not so scalable. Instead of using IN clause try using INNER JOIN with temp table. Refer http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/ for more info. Using INNER JOIN scales well as query optimizer can make use of hash join and other optimization. Whereas with IN clause there is no way for the optimizer to optimize the query. I have noticed speedup of at least 2x with this change.
OR
and IN
clauses due to the large overhead in parsing and planning such queries, I could not confirm the problem with Postgres 9.5, see this answer. –
Diffract Just tried it. the answer is -> out-of-range integer as a 2-byte value: 32768
You might want to consider refactoring that query instead of adding an arbitrarily long list of ids... You could use a range if the ids indeed follow the pattern in your example:
SELECT * FROM user WHERE id >= minValue AND id <= maxValue;
Another option is to add an inner select:
SELECT *
FROM user
WHERE id IN (
SELECT userId
FROM ForumThreads ft
WHERE ft.id = X
);
If you have query like:
SELECT * FROM user WHERE id IN (1, 2, 3, 4 -- and thousands of another keys)
you may increase performace if rewrite your query like:
SELECT * FROM user WHERE id = ANY(VALUES (1), (2), (3), (4) -- and thousands of another keys)
EXPLAIN
says it's internally rewriting my IN (...)
as ANY ('{...}'::integer[])
. –
Charlottcharlotta © 2022 - 2024 — McMap. All rights reserved.