I'm working with a relatively large Postgres db containing several terabytes of data. It consists of data from 5 years back. As a result all the indexes are growing and therefore reducing the query performance as well as consuming lots of storage. Current index definition (original_index
) is follows:
CREATE INDEX original_index ON table1
USING btree (key10 DESC)
I'm trying to restrict the index to only 1 year back from now and index definition (constrained_index
) for that is as follows: (I removed the original_index
before creating constrained_index
)
CREATE INDEX constrained_index ON table1
USING btree (key10 DESC)
WHERE (key10 >= '2023-06-01 00:00:00+00'::timestamp with time zone)
I was using a specific query for both the original_index
and constrained_index
to see before and after execution time.
When I have only original_index
, query used it for the query planner. But when I have only constrained_index
, query planner does not use constrained_index
. Instead it used a sequential scan on the table.
After creating constrained_index
I tried running vacuum verbose analyze table1;
to update the stats of the db.
It's highly appreciated if someone can explain why it is not selecting index2.
Postgres Version: 12.5
Table definitions:
CREATE TABLE table1 (
key1 integer,
key2 character varying(128),
key3 character varying(64),
key4 character varying(128),
key5 character varying(10),
key6 timestamp with time zone,
key7 timestamp with time zone,
key8 integer NOT NULL GENERATED BY DEFAULT AS IDENTITY
key9 boolean NOT NULL,
key10 timestamp with time zone
);
CREATE TABLE table2 (
key2 integer NOT NULL GENERATED BY DEFAULT AS IDENTITY
key3 timestamp with time zone NOT NULL,
key1 integer
);
CREATE TABLE table3 (
key1 character varying(128) NOT NULL,
key2 text
);
CREATE TABLE table4 (
key1 character varying(64) NOT NULL,
key2 text
);
Query:
SELECT table1.key1 AS a,
table1.key2 AS b,
table1.key3 AS c,
table1.key4 AS d,
table2.key1 AS e,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 = 'A') AS f,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 = 'B') AS g,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 = 'C') AS h,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 = 'D') AS i,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key5 IS NULL) AS j,
SUM((table1.key6 - table1.key7)) AS k,
COUNT(DISTINCT table1.key8) FILTER (WHERE table1.key9) AS l,
COUNT(DISTINCT table1.key8) AS m
FROM table1
LEFT OUTER JOIN table3 ON (table1.key4 = table3.key1)
LEFT OUTER JOIN table2 ON (table1.key1 = table2.key2)
LEFT OUTER JOIN table4 ON (table1.key3 = table4.key1)
WHERE table1.key10 BETWEEN '2023-10-31 23:00:00+00:00' AND '2023-11-02 23:00:00+00:00'
GROUP BY 1,
2,
3,
4,
5,
table2.key3,
table4.key2,
table3.key2
ORDER BY table2.key3 DESC,
table1.key2 ASC,
table4.key2 DESC,
table3.key2 DESC,
table2.key1 ASC
LIMIT 20
Output of EXPLAIN ANALYZE
:
Limit (cost=18341692.59..18341693.89 rows=20 width=159) (actual time=32149.681..32192.245 rows=20 loops=1)
-> GroupAggregate (cost=18341692.59..18353658.57 rows=184092 width=159) (actual time=31749.816..31792.379 rows=20 loops=1)
Group Key: table2.key3, table1.key2, table4.key2, table3.key2, table2.key1, table1.key1, table1.key3, table1.key4
-> Sort (cost=18341692.59..18342152.82 rows=184092 width=115) (actual time=31749.763..31792.253 rows=80 loops=1)
Sort Key: table2.key3 DESC, table1.key2, table4.key2 DESC, table3.key2 DESC, table2.key1, table1.key1, table1.key3, table1.key4
Sort Method: external merge Disk: 23424kB
-> Gather (cost=83595.20..18314267.69 rows=184092 width=115) (actual time=31417.800..31681.660 rows=186540 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Hash Left Join (cost=82595.20..18294858.49 rows=76705 width=115) (actual time=31392.569..31545.627 rows=62180 loops=3)
Hash Cond: ((table1.key4)::text = (table3.key1)::text)
-> Hash Left Join (cost=81113.79..18293175.70 rows=76705 width=90) (actual time=31379.926..31524.528 rows=62180 loops=3)
Hash Cond: ((table1.key3)::text = (table4.key1)::text)
-> Parallel Hash Left Join (cost=78481.53..18288053.08 rows=76705 width=72) (actual time=31361.889..31481.346 rows=62180 loops=3)
Hash Cond: (table1.key1 = table2.key2)
-> Parallel Seq Scan on table1 (cost=0.00..18200395.20 rows=76705 width=60) (actual time=2.888..30824.027 rows=62180 loops=3)
Filter: ((key10 >= '2023-10-31 23:00:00+00'::timestamp with time zone) AND (key10 "<= '2023-11-02 23:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 43882417
-> Parallel Hash (cost=52397.57..52397.57 rows=1500557 width=16) (actual time=497.607..497.608 rows=1200445 loops=3)
Buckets: 131072 Batches: 64 Memory Usage: 3712kB
-> Parallel Seq Scan on table2 (cost=0.00..52397.57 rows=1500557 width=16) (actual time=225.666..349.851 rows=1200445 loops=3)
-> Hash (cost=1244.45..1244.45 rows=71745 width=25) (actual time=17.720..17.721 rows=72031 loops=3)
Buckets: 65536 Batches: 2 Memory Usage: 2568kB
-> Seq Scan on table4 (cost=0.00..1244.45 rows=71745 width=25) (actual time=0.019..6.313 rows=72031 loops=3)
-> Hash (cost=893.96..893.96 rows=46996 width=40) (actual time=12.367..12.368 rows=47030 loops=3)
Buckets: 65536 Batches: 1 Memory Usage: 3865kB
-> Seq Scan on table3 (cost=0.00..893.96 rows=46996 width=40) (actual time=0.017..4.731 rows=47030 loops=3)
Planning Time: 2.249 ms
JIT:
Functions: 90
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 8.216 ms, Inlining 84.266 ms, Optimization 596.840 ms, Emission 394.586 ms, Total 1083.909 ms
Execution Time: 32199.853 ms
d need to see
EXPLAIN (ANALYZE, BUFFERS, SETTINGS)` output for the queries. Also, you shouldn't make a secret of your PostgreSQL version. – Triumph<key1>::text = (<key2>)::text
is also interesting, it looks like you do some casting to text. That could destroy the usage of indexes, since it looks like that at least <key1> is a timestamp. Not TEXT. Next time share some information, like the DDL for the tables and indexes, the query itself and a compleet query plan that matches the DDL. We can now only guess what is going on. – Cartelizecreator_id
anddocument_type_id
anywhere, which would explain why PostgreSQL doesn't choose either of your indexes. – Triumphexplain(analyze, verbose, buffers, settings)
to get some more details about the query plan. But that only makes sense if you don't confuse everybody by changing these details... – CartelizeCREATE INDEX ON <table1> USING btree (<key1>,<key2>,<key3>,<key4>) WHERE (<key10> >= '2023-06-01'::timestamptz AND <key10> < '2023-07-01'::timestamptz);
. And that for each month to partition the index that covers the GROUP BY for this table. Maybe you have to add a NULL condition, since this column could be NULL. – Cartelizeoriginal_index
and shrink it based on the timestamp to createconstrained_index
. I was wandering why the same query selectsoriginal_index
but notconstrained_index
... – OlioWHERE (<key11> >= ...)
, which obviously does not match the condition on key10. Typo or not? And is the main table named "document" or "table1". Please get your question straight. And cut the noise."<key1>"
has no benefit over justkey1
etc. – IntercourseEXPLAIN (ANALYZE, BUFFERS, SETTINGS)
as instructed here: stackoverflow.com/tags/postgresql-performance/info That would tell us (among other useful things) about unusual performance settings right away. Smells like bad server config. Not enoughwork_mem
. Maybe discouraging settings for index usage, likerandom_page_cost = 4
,which is typically too high, ... – Intercoursevacuum verbose analyze table1;
" Did you run it successfully? (VERBOSE
is irrelevant.) – IntercourseEXPLAIN (ANALYZE, BUFFERS, SETTINGS)
, butBUFFERS
andSETTINGS
seems not available in this particular version. I'm sure thevacuum
was executed successfully. There were nothing unusual I noticed. – Oliobuffers
should be there since 9-something andsettings
should be there since 12.0 – Alphonsepgadmin
agains the same db where it was erroring for some reason for thosebuffers
andsettings
. Next time I will try to get all those info from CLI itself. Thanks for correcting me. – Olio