Postgres SLOWER when a LIMIT is set: how to fix besides adding a dummy `ORDER BY`?

In Postgres, some queries are a whole lot slower when adding a LIMIT:

The queries:

SELECT * FROM review WHERE clicker_id=28 ORDER BY done DESC LIMIT 4; -- 51 sec
SELECT * FROM review WHERE clicker_id=28 ORDER BY id, done DESC LIMIT 4; -- 0.020s
SELECT * FROM review WHERE clicker_id=28 LIMIT 4; -- 0.007s
SELECT * FROM review WHERE clicker_id=28 ORDER BY id; -- 0.007s

As you can see, I need to add a dummy id to the ORDER BY in order for things to go fast. I'm trying to understand why.

Running EXPLAIN on them:

EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY done DESC LIMIT 4;
EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY id, done DESC LIMIT 4;
EXPLAIN SELECT * FROM review WHERE clicker_id=28 LIMIT 4;
EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY id;

gives this:

EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY done DESC LIMIT 4
Limit  (cost=0.44..249.76 rows=4 width=56)
  ->  Index Scan using review_done on review  (cost=0.44..913081.13 rows=14649 width=56)
        Filter: (clicker_id = 28)


EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY id, done DESC LIMIT 4
Limit  (cost=11970.75..11970.76 rows=4 width=56)
  ->  Sort  (cost=11970.75..12007.37 rows=14649 width=56)
        Sort Key: id, done DESC
        ->  Index Scan using review_clicker_id on review  (cost=0.44..11751.01 rows=14649 width=56)
              Index Cond: (clicker_id = 28)


EXPLAIN SELECT * FROM review WHERE clicker_id=28 LIMIT 4
Limit  (cost=0.44..3.65 rows=4 width=56)
  ->  Index Scan using review_clicker_id on review  (cost=0.44..11751.01 rows=14649 width=56)
        Index Cond: (clicker_id = 28)


EXPLAIN SELECT * FROM review WHERE clicker_id=28 ORDER BY id
Sort  (cost=12764.61..12801.24 rows=14649 width=56)
  Sort Key: id
  ->  Index Scan using review_clicker_id on review  (cost=0.44..11751.01 rows=14649 width=56)
        Index Cond: (clicker_id = 28)

I'm no SQL expert, but I take it Postgres expected the query to be faster than it actually is, and so used a way to fetch the data that's actually inappropriate, correct?

The database:

The review table:
- Contains 22+ million rows.
  - A given user will get 7 066 rows tops.
  - The one in the test (id 28) has 288 at the time.
- Has this structure:
  - id: bigint Auto Increment [nextval('public.review_id_seq')]
  - type: review_type NULL
  - iteration: smallint NULL
  - repetitions: smallint NULL
  - due: timestamptz NULL
  - done: timestamptz NULL
  - added: timestamptz NULL
  - clicker_id: bigint NULL
  - monologue_id: bigint NULL
- Has these indexes:
  - UNIQUE type, clicker_id, monologue_id, iteration
  - INDEX clicker_id
  - INDEX done, due, monologue_id
  - INDEX id
  - INDEX done DESC
  - INDEX type

Additional details:

Environment:

The queries were ran in development with Postgres 9.6.14.
Running the queries into production (Heroku Postgres, version 9.6.16) the difference is less dramatic, but still not great: the slow queries might take 600 ms.

Variable speed:

Sometimes, the same queries (be it the exact same, or for a different clicker_id) run a lot faster (under 1 sec), but I don't understand why. I need them to be consistently fast.
If I use LIMIT 288 for a user that has 288 rows, then it's so much faster (< 1sec), but if I do the same for a user with say 7066 rows then it's back to super slow.

Before I figured the use of a dummy `ORDER BY`, I tried these:

Re-importing the database.
analyze review;
Setting the index for done to DESC (used to be set to default/ASC.) [The challenge then was that there's no proper way to check if/when the index is done rebuilding.]

None helped.

The question:

My issue in itself is solved, but I'm dissatisfied with it:

Is there a name for this "pattern" that consists of adding a dummy ORDER BY to speed things up?
How can I spot such issues in the future? (This took ages to figure.) Unless I missed something, the EXPLAIN is not that useful:
- For the slow query, the cost is misleadingly slow, while for the fast variant it's misleadingly high.
Alternative: is there another way to handle this? (Because this solution feels like a hack.)

Thanks!

Similar questions:

PostgreSQL query very slow with limit 1 is almost the same question, except his queries were slow with LIMIT 1 and fine with LIMIT 3 or higher. And then of course the data is not the same.

The underlying issue here is what's called an abort-early query plan. Here's a thread from pgsql-hackers describing something similar:

https://www.postgresql.org/message-id/541A2335.3060100%40agliodbs.com

Quoting from there, this is why the planner is using the often-extremely-slow index scan when the ORDER BY done DESC is present:

As usual, PostgreSQL is dramatically undercounting n_distinct: it shows chapters.user_id at 146,000 and the ratio of to_user_id:from_user_id as being 1:105 (as opposed to 1:6, which is about the real ratio). This means that PostgreSQL thinks it can find the 20 rows within the first 2% of the index ... whereas it actually needs to scan 50% of the index to find them.

In your case, the planner thinks that if it just starts going through rows ordered by done desc (IOW, using the review_done index), it will find 4 rows with clicker_id=28 quickly. Since the rows need to be returned in "done" descending order, it thinks this will save a sort step and be faster than retrieving all rows for clicker 28 and then sorting. Given the real-world distribution of rows, this can often turn out not to be the case, requiring it to skip a huge number of rows before finding 4 with clicker=28.

A more-general way of handling it is to use a CTE (which, in 9.6, is still an optimization fence - this changes in PG 12, FYI) to fetch the rows without an order by, then add the ORDER BY in the outer query. Given that fetching all rows for a user, sorting them, and returning however many you need is completely reasonable for your dataset (even the 7k-rows clicker shouldn't be an issue), you can prevent the planner from believing an abort-early plan will be fastest by not having an ORDER BY or a LIMIT in the CTE, giving you a query of something like:

WITH clicker_rows as (SELECT * FROM review WHERE clicker_id=28)
select * From clicker_rows ORDER BY done DESC LIMIT 4;

This should be reliably fast while still respecting the ORDER BY and the LIMIT you want. I'm not sure if there's a name for this pattern, though.

The queries:

The database:

Additional details:

Environment:

Variable speed:

Before I figured the use of a dummy `ORDER BY`, I tried these:

The question:

Recommended topics

Hot tags

The queries:

The database:

Additional details:

Environment:

Variable speed:

Before I figured the use of a dummy ORDER BY, I tried these:

The question:

Recommended topics

Hot tags

Before I figured the use of a dummy `ORDER BY`, I tried these: