Is there any better option to apply pagination without applying OFFSET in SQL Server?
Asked Answered
K

4

26

I want to apply pagination on a table with huge data. All I want to know a better option than using OFFSET in SQL Server.

Here is my simple query:

SELECT *
FROM TableName
ORDER BY Id DESC 
OFFSET 30000000 ROWS
FETCH NEXT 20 ROWS ONLY
Knisley answered 29/12, 2021 at 12:37 Comment(7)
Yes. use-the-index-luke.com/sql/partial-results/fetch-next-pageSuprarenal
Why do you need a better option than OFFSET / FETCH ? What's wrong with OFFSET / FETCH ??Latria
Do you really need to jump to page 1,500,000? Perhaps you need a better criteria?Ejectment
@Latria Recently I have learnt that the way OFFSET works are by counting how many rows it should skip. After that it is giving your result. In other words, to get the results from rows 30000000 to 30000020 it needs to scan through the first 30000000 and then throw them away. It seems to much wasteful. So I want to know a better option to apply server side pagination if there exist any better way.Knisley
See also use-the-index-luke.com/no-offset, it's called Keyset Pagination.Alumna
Does this answer your question? What is the best way to paginate results in SQL ServerKnackwurst
@Md.TarikulIslamSoikot,can you your table schema and existing index ,along with datatype ?Hushaby
A
48

You can use Keyset Pagination for this. It's far more efficient than using Rowset Pagination (paging by row number).

In Rowset Pagination, all previous rows must be read, before being able to read the next page. Whereas in Keyset Pagination, the server can jump immediately to the correct place in the index, so no extra rows are read that do not need to be.

For this to perform well, you need to have a unique index on that key, which includes any other columns you need to query.

In this type of pagination, you cannot jump to a specific page number. You jump to a specific key and read from there. So you need to save the unique ID of page you are on and skip to the next. Alternatively, you could calculate or estimate a starting point for each page up-front.

One big benefit, apart from the obvious efficiency gain, is avoiding the "missing row" problem when paginating, caused by rows being removed from previously read pages. This does not happen when paginating by key, because the key does not change.


Here is an example:

Let us assume you have a table called TableName with an index on Id, and you want to start at the latest Id value and work backwards.

You begin with:

SELECT TOP (@numRows)
  *
FROM TableName
ORDER BY Id DESC;

Note the use of ORDER BY to ensure the order is correct

In some RDBMSs you need LIMIT instead of TOP

The client will hold the last received Id value (the lowest in this case). On the next request, you jump to that key and carry on:

SELECT TOP (@numRows)
  *
FROM TableName
WHERE Id < @lastId
ORDER BY Id DESC;

Note the use of < not <=

In case you were wondering, in a typical B-Tree+ index, the row with the indicated ID is not read, it's the row after it that's read.


The key chosen must be unique, so if you are paging by a non-unique column then you must add a second column to both ORDER BY and WHERE. You would need an index on OtherColumn, Id for example, to support this type of query. Don't forget INCLUDE columns on the index.

SQL Server does not support row/tuple comparators, so you cannot do (OtherColumn, Id) < (@lastOther, @lastId) (this is however supported in PostgreSQL, MySQL, MariaDB and SQLite).

Instead you need the following:

SELECT TOP (@numRows)
  *
FROM TableName
WHERE (
    (OtherColumn = @lastOther AND Id < @lastId)
    OR OtherColumn < @lastOther
)
ORDER BY
  OtherColumn DESC,
  Id DESC;

This is more efficient than it looks, as SQL Server can convert this into a proper < over both values.

The presence of NULLs complicates things further. You may want to query those rows separately.

Alumna answered 29/12, 2021 at 14:0 Comment(9)
So you cannot jump directly to 30,000,000th row unless you have read all previous rows to calculate the @lastId. How is this useful?Ethicize
That's true, you cannot, and I did mention this. But it's pretty rare that a user actually wants to do so from the start. As you said yourself in a comment "Does someone really need to look at 30 millionth row?" Normally they might say "I'm up to this point in the list, I want the next few rows" and paginating by key works better for this, because the missing rows problem doesn't affect it (if you remove rows from earlier pages and you page by row-number then you will miss rows). It works very well for infinite scrolling, and for batch operationsAlumna
I've been trying to implement keyset pagination but all the examples assume you're using a type which you can use the 'less than' or 'greater than' operand. If you are sorting by a string (and then a unique ID, which in my case is a GUID), what are we supposed to do? We can't do string > string and string.Compare() cannot be translated by EF.Gumboil
Are you sure it can't be translated? It works fine for me. Either string.Compare(a, b) > 0 or a.CompareTo(b) > 0. Do not use a StringComparison value, instead define your column with the right collationAlumna
@AndyFurniss GUIDs as PKs are non-performant, not a solution but try to avoid those in the first place unless you have a good reason to choose them.Viperous
Besides requiring reading rows sequentially as @SalmanA mentioned earlier. There is another problem. Say i wanted to sort the table by updated_at column. updated_at field get updated sporadically, so sorting by this column does not ensure that primary key / unique key will be continuous even after we include a primary key in the sort order.Christiansen
Like I said, you need an index on whatever column you are sorting (plus possibly a unique column) so in that case you would have an index on (updated_at, PK) and the WHERE and ORDER BY would also use both of those columns. The results are then always continuous in the specified order, and if you go ascending you won't miss or duplicate any rows while paging.Alumna
what is the advantage of having updated_at as the first key? why cant i do (id, updated_at desc). It will sort by id and updated_at and store it by default in the order I want and then I use where clause to keyset paginate?Gustavogustavus
No, if id is the first column then you would need to paginate by id first. And given that id is unique, there is no need for any other columns. The point here is: you want to paginate by updated-at, but you need to uniquely identify a row so you add another column id which is unique. The pagination always goes in the same order as the key columns of the index.Alumna
U
8

On very big merchant website we use a technic compound of ids stored in a pseudo temporary table and join with this table to the rows of the product table.

Let me talk with a clear example.

We have a table design this way :

CREATE TABLE S_TEMP.T_PAGINATION_PGN
(PGN_ID              BIGINT IDENTITY(-9 223 372 036 854 775 808, 1) PRIMARY KEY,
 PGN_SESSION_GUID    UNIQUEIDENTIFIER NOT NULL,
 PGN_SESSION_DATE    DATETIME2(0) NOT NULL,
 PGN_PRODUCT_ID      INT NOT NULL,
 PGN_SESSION_ORDER   INT NOT NULL);
CREATE INDEX X_PGN_SESSION_GUID_ORDER 
   ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_GUID, PGN_SESSION_ORDER)
   INCLUDE (PGN_SESSION_ORDER);
CREATE INDEX X_PGN_SESSION_DATE 
   ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_DATE);

We have a very big product table call T_PRODUIT_PRD and a customer filtered it with many predicates. We INSERT rows from the filtered SELECT into this table this way :

DECLARE @SESSION_ID UNIQUEIDENTIFIER = NEWID();
INSERT INTO S_TEMP.T_PAGINATION_PGN
SELECT @SESSION_ID , SYSUTCDATETIME(), PRD_ID,
       ROW_NUMBER() OVER(ORDER BY --> custom order by
FROM   dbo.T_PRODUIT_PRD 
WHERE  ... --> custom filter

Then everytime we need a desired page, compound of @N products we add a join to this table as :

...
JOIN S_TEMP.T_PAGINATION_PGN
   ON PGN_SESSION_GUID = @SESSION_ID
      AND 1 + (PGN_SESSION_ORDER / @N) = @DESIRED_PAGE_NUMBER
      AND PGN_PRODUCT_ID = dbo.T_PRODUIT_PRD.PRD_ID

All the indexes will do the job !

Of course, regularly we have to purge this table and this is why we have a scheduled job which deletes the rows whose sessions were generated more than 4 hours ago :

DELETE FROM S_TEMP.T_PAGINATION_PGN
WHERE  PGN_SESSION_DATE < DATEADD(hour, -4, SYSUTCDATETIME());
Unlimited answered 5/1, 2022 at 13:31 Comment(2)
How does your scheduled housekeeping job perform when users are currently searching and querying the table? Do you get locks? Or does your job run at night?Zelda
job runs every hour actually and delete only 8 hours old data.Unlimited
I
0

In the same spirit as SQLPro solution, I propose:

WITH CTE AS
(SELECT 30000000  AS N
UNION ALL SELECT N-1 FROM CTE
WHERE N > 30000000 +1 - 20)
SELECT T.* FROM CTE JOIN TableName T ON CTE.N=T.ID
ORDER BY CTE.N DESC

Tried with 2 billion lines and it's instant ! Easy to make it a stored procedure... Of course, valid if ids follow each other.

Intuit answered 5/1, 2022 at 19:26 Comment(1)
"Of course, valid if ids follow each other" - ohLongueur
D
0

I have experimented with various approaches, but the following method consistently retrieves data faster than any others.

  CREATE proc [dbo].[GetTransDetails](
   @PageNo int = 1,
   @PageSize int)as BEGIN

declare @idfrom int=1
declare @idto int=30  //number of rows

SET NOCOUNT ON;
    
if @PageNo>1 
    begin
        set @idfrom=(@PageNo*30)-29
        set @idto=@PageNo*30
    end 

select top 30 * from
    (select ROW_NUMBER() OVER (ORDER BY id desc) AS rownumber,
    *    
FROM transdetails   
     )transList
where transList.rownumber between @idfrom and @idto  END
Dicast answered 4/7, 2024 at 13:1 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.