Slow query optimisation in Postgres
Asked Answered
L

1

6

We have a performance issue with a specific SQL query and we're trying to figure out how could we improve here. It's execution time on is about 20 - 100 seconds!

Here is the query and it's explain:

SELECT  "jobs".* FROM "jobs"
  WHERE "jobs"."status" IN (1, 2, 3, 4)
  ORDER BY "jobs"."due_date" ASC
  LIMIT 5;


Limit  (cost=0.42..1844.98 rows=5 width=2642) (actual time=16927.150..18151.643 rows=1 loops=1)
   ->  Index Scan using index_jobs_on_due_date on jobs  (cost=0.42..1278647.41 rows=3466 width=2642) (actual time=16927.148..18151.641 rows=1 loops=1)
         Filter: (status = ANY ('{1,2,3,4}'::integer[]))
         Rows Removed by Filter: 595627
 Planning time: 0.205 ms
 Execution time: 18151.684 ms

We are using PostgreSQL 9.6.11 on AWS RDS.

In a table we have ~500K rows. Fields realted to the query are:

  • due_date(timestamp without time zone, can be null)
  • status(integer, not null)

We have the following indexes:

CREATE INDEX index_jobs_on_due_date ON public.jobs USING btree (due_date)
CREATE INDEX index_jobs_on_due_date_and_status ON public.jobs USING btree (due_date, status)
CREATE INDEX index_jobs_on_status ON public.jobs USING btree (status)
CREATE UNIQUE INDEX jobs_pkey ON public.jobs USING btree (id)

Thank you in advance, - Jack

Loyal answered 4/6, 2019 at 10:19 Comment(0)
C
1

For this query:

SELECT  j.*
FROM "jobs" j
WHERE j."status" IN (1, 2, 3, 4)
ORDER BY "jobs"."due_date" ASC
LIMIT 5;

The "obvious" index is on (status). But that may not help. The goal is to get rid of the sorting. So, you can rewrite the query and use an index jobs(status, due_date):

select j.*
from ((select j.*
       from jobs j
       where j.status = 1
       order by j.due_date asc
       limit 5
      ) union all
      (select j.*
       from jobs j
       where j.status = 2
       order by j.due_date asc
       limit 5
      ) union all
      (select j.*
       from jobs j
       where j.status = 3
       order by j.due_date asc
       limit 5
      ) union all
      (select j.*
       from jobs j
       where j.status = 4
       order by j.due_date asc
       limit 5
      )
     ) j
order by due_date
limit 5;

The subqueries should each use the composite index. The final sort would then be on (at most) 20 rows, which should be fast).

EDIT:

Here is a related idea, with the same index:

SELECT j.*
FROM (SELECT  j.*,
              ROW_NUMBER() OVER (PARTITION BY j.status ORDER BY j.due_date ASC) as seqnum
      FROM "jobs" j
     ) j
WHERE j.status in (1, 2, 3, 4) AND seqnum <= 5
ORDER BY j.due_date ASC
LIMIT 5;

This can use the index for the ROW_NUMBER() calculation. That might require a full table scan of the table. But, the final sort will be limited to 20 rows, so the final sort is eliminated.

Carin answered 4/6, 2019 at 10:33 Comment(2)
The solution you provide works, but in our case, we have a lot of queries of such kind (which choose from multiple statuses). Also, those queries usually have a lot of additional filters and conditions. So it will be quite tricky to build that kind of unions. Is there a way to force using the statuses index there - looks like it's slow because it is not used. Or other possible solutions? Thanks!Loyal
@JackJuiceson . . . It is not about forcing the use of the status index; it is about the sort afterwards.Carin

© 2022 - 2024 — McMap. All rights reserved.