Index skip scan emulation to retrieve distinct product IDs and min/max for additional columns

CREATE TABLE tickers ( product_id TEXT NOT NULL, trade_id INT NOT NULL, sequence BIGINT NOT NULL, time TIMESTAMPTZ NOT NULL, price NUMERIC NOT NULL, side TEXT NOT NULL, last_size NUMERIC NOT NULL, best_bid NUMERIC NOT NULL, best_ask NUMERIC NOT NULL, PRIMARY KEY (product_id, trade_id) ); CREATE INDEX idx_tickers_product_id_time ON tickers (product_id, time);

WITH product_ids AS ( WITH RECURSIVE cte AS ( ( -- parentheses required SELECT product_id FROM tickers ORDER BY 1 LIMIT 1 ) UNION ALL SELECT l.* FROM cte c CROSS JOIN LATERAL ( SELECT product_id FROM tickers t WHERE t.product_id > c.product_id -- lateral reference ORDER BY 1 LIMIT 1 ) l ) TABLE cte ) SELECT product_id, (SELECT (MAX(trade_id) - MIN(trade_id) + 1) FROM tickers WHERE product_id = product_ids.product_id) AS ticker_count, (SELECT MIN(time) FROM tickers WHERE product_id = product_ids.product_id) AS min_time, (SELECT MAX(time) FROM tickers WHERE product_id = product_ids.product_id) AS max_time FROM product_ids ORDER BY ticker_count DESC

Query

Using the existing index on (product_id, time) we can get two for the price of one, i.e. fetch product_id and minimum time in one index scan:

WITH RECURSIVE product_ids AS (
   (   -- parentheses required
   SELECT product_id, time AS min_time
   FROM   tickers
   ORDER  BY 1, 2
   LIMIT  1
   )
   UNION ALL
   SELECT l.*
   FROM   product_ids p
   CROSS JOIN LATERAL (
      SELECT t.product_id, t.time
      FROM   tickers t
      WHERE  t.product_id > p.product_id
      ORDER  BY 1, 2
      LIMIT  1
      ) l
   )
SELECT product_id, min_time
    , (SELECT MAX(time) FROM tickers WHERE product_id = p.product_id) AS max_time
    , (SELECT MAX(trade_id) - MIN(trade_id) + 1 FROM tickers WHERE product_id = p.product_id) AS ticker_count
FROM   product_ids p
ORDER  BY ticker_count DESC;

Also, no need for a second CTE wrapper.

Indexes

Currently you have two indexes: The PK index on (product_id, trade_id), and another one on (product_id, time). You might optimize this by reversing the column order in one of both. Like:

PRIMARY KEY (trade_id, product_id)

Logically equivalent, but typically more efficient as it covers a wider range of possible queries. See (again):

Is a composite index also good for queries on the first field?

We only need the existing index on (product_id, time), so no direct effect on this query.

Query

Indexes

Recommended topics

Hot tags