In Postgres, some queries are a whole lot slower when adding a LIMIT
:
The queries:
SELECT * FROM review WHERE clicker_id=28 ORDER BY done DESC LIMIT 4; -- 51 sec
SELECT * FROM review WHERE clicker_id=28 ORDER BY id, done DESC LIMIT 4; -- 0.020s
SELECT * FROM review WHERE clicker_id=28 LIMIT 4; -- 0.007s
SELECT * FROM review WHERE clicker_id=28 ORDER BY id; -- 0.007s
As you can see, I need to add a dummy id to the ORDER BY
in order for things to go fast. I'm trying to understand why.
Running EXPLAIN
on them:
EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY done DESC LIMIT 4;
EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY id, done DESC LIMIT 4;
EXPLAIN SELECT * FROM review WHERE clicker_id=28 LIMIT 4;
EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY id;
gives this:
EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY done DESC LIMIT 4
Limit (cost=0.44..249.76 rows=4 width=56)
-> Index Scan using review_done on review (cost=0.44..913081.13 rows=14649 width=56)
Filter: (clicker_id = 28)
EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY id, done DESC LIMIT 4
Limit (cost=11970.75..11970.76 rows=4 width=56)
-> Sort (cost=11970.75..12007.37 rows=14649 width=56)
Sort Key: id, done DESC
-> Index Scan using review_clicker_id on review (cost=0.44..11751.01 rows=14649 width=56)
Index Cond: (clicker_id = 28)
EXPLAIN SELECT * FROM review WHERE clicker_id=28 LIMIT 4
Limit (cost=0.44..3.65 rows=4 width=56)
-> Index Scan using review_clicker_id on review (cost=0.44..11751.01 rows=14649 width=56)
Index Cond: (clicker_id = 28)
EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY id
Sort (cost=12764.61..12801.24 rows=14649 width=56)
Sort Key: id
-> Index Scan using review_clicker_id on review (cost=0.44..11751.01 rows=14649 width=56)
Index Cond: (clicker_id = 28)
I'm no SQL expert, but I take it Postgres expected the query to be faster than it actually is, and so used a way to fetch the data that's actually inappropriate, correct?
The database:
- The
review
table:- Contains 22+ million rows.
- A given user will get 7 066 rows tops.
- The one in the test (id 28) has 288 at the time.
- Has this structure:
- id: bigint Auto Increment [nextval('public.review_id_seq')]
- type: review_type NULL
- iteration: smallint NULL
- repetitions: smallint NULL
- due: timestamptz NULL
- done: timestamptz NULL
- added: timestamptz NULL
- clicker_id: bigint NULL
- monologue_id: bigint NULL
- Has these indexes:
- UNIQUE type, clicker_id, monologue_id, iteration
- INDEX clicker_id
- INDEX done, due, monologue_id
- INDEX id
- INDEX done DESC
- INDEX type
- Contains 22+ million rows.
Additional details:
Environment:
- The queries were ran in development with Postgres 9.6.14.
- Running the queries into production (Heroku Postgres, version 9.6.16) the difference is less dramatic, but still not great: the slow queries might take 600 ms.
Variable speed:
- Sometimes, the same queries (be it the exact same, or for a different clicker_id) run a lot faster (under 1 sec), but I don't understand why. I need them to be consistently fast.
- If I use
LIMIT 288
for a user that has 288 rows, then it's so much faster (< 1sec), but if I do the same for a user with say 7066 rows then it's back to super slow.
Before I figured the use of a dummy ORDER BY
, I tried these:
- Re-importing the database.
analyze review;
- Setting the index for done to DESC (used to be set to default/ASC.) [The challenge then was that there's no proper way to check if/when the index is done rebuilding.]
None helped.
The question:
My issue in itself is solved, but I'm dissatisfied with it:
- Is there a name for this "pattern" that consists of adding a dummy
ORDER BY
to speed things up? - How can I spot such issues in the future? (This took ages to figure.) Unless I missed something, the
EXPLAIN
is not that useful:- For the slow query, the cost is misleadingly slow, while for the fast variant it's misleadingly high.
- Alternative: is there another way to handle this? (Because this solution feels like a hack.)
Thanks!
Similar questions:
- PostgreSQL query very slow with limit 1 is almost the same question, except his queries were slow with LIMIT 1 and fine with LIMIT 3 or higher. And then of course the data is not the same.