How does pagination of results in databases work?
Asked Answered
M

2

20

This is a general question that applies to MySQL, Oracle DB or whatever else might be out there.

I know for MySQL there is LIMIT offset,size; and for Oracle there is 'ROW_NUMBER' or something like that.

But when such 'paginated' queries are called back to back, does the database engine actually do the entire 'select' all over again and then retrieve a different subset of results each time? Or does it do the overall fetching of results only once, keeps the results in memory or something, and then serves subsets of results from it for subsequent queries based on offset and size?

If it does the full fetch every time, then it seems quite inefficient.

If it does full fetch only once, it must be 'storing' the query somewhere somehow, so that the next time that query comes in, it knows that it has already fetched all the data and just needs to extract next page from it. In that case, how will the database engine handle multiple threads? Two threads executing the same query?

I am very confused :(

Muzzy answered 13/6, 2018 at 21:37 Comment(0)
N
12

Yes, the query is executed over again when you run it with a different OFFSET.

Yes, this is inefficient. Don't do that if you have a need to paginate through a large result set.

I'd suggest doing the query once, with a large LIMIT — enough for 10 or 12 pages. Then save the result in a cache. When the user wants to advance through several pages, then your application can fetch the 10-12 pages you saved in the cache and display the page the user wants to see. That is usually much faster than running the SQL query for each page.

This works well if, like most users, your user reads only a few pages and then changes their query.


Re your comment:

By cache I mean something like Memcached or Redis. A high-speed, in-memory key/value store.

MySQL views don't store anything, they're more like a macro that runs a predefined query for you.

Oracle supports materialized views, so that might work better, but querying the view would have the overhead of interpreting an SQL query.

A simpler in-memory cache should be much faster.

Negrillo answered 13/6, 2018 at 21:50 Comment(3)
This makes sense. By "cache" you mean a view or something like that?Muzzy
what about a scenario where my data is highly dynamic?Praline
What do you mean by dynamic in this case? Updated frequently?Negrillo
A
15

I desagree with @Bill Karwin. First of all, do not make assumptions in advance whether something will be quick or slow without taking measurements, and complicate the code in advance to download 12 pages at once and cache them because "it seems to me that it will be faster".

YAGNI principle - the programmer should not add functionality until deemed necessary.
Do it in the simplest way (ordinary pagination of one page), measure how it works on production, if it is slow, then try a different method, if the speed is satisfactory, leave it as it is.


From my own practice - an application that retrieves data from a table containing about 80,000 records, the main table is joined with 4-5 additional lookup tables, the whole query is paginated, about 25-30 records per page, about 2500-3000 pages in total. Database is Oracle 12c, there are indexes on a few columns, queries are generated by Hibernate. Measurements on production system at the server side show that an average time (median - 50% percentile) of retrieving one page is about 300 ms. 95% percentile is less than 800 ms - this means that 95% of requests for retrieving a single page is less that 800ms, when we add a transfer time from the server to the user and a rendering time of about 0.5-1 seconds, the total time is less than 2 seconds. That's enough, users are happy.


And some theory - see this answer to know what is purpose of Pagination pattern

Aegospotami answered 13/6, 2018 at 22:41 Comment(3)
I agree that one should measure first to make sure there's a real performance problem before implementing a workaround. I was assuming that the performance problem was a given.Negrillo
@Aegospotami My question was actually not about performance but whether the database simply fetches sections of results from 'one select query' or does it run the 'select' query each time - i.e. does it maintain a session across multiple paginated query calls. In your example of 80K records, 25-30 rows per page - does the database fetch all 80K rows every time and return a subset of the rows (based on page offset, size) or does it fetch 80K rows ONCE and then caches them? As per Bill, it doesn't (and that's something I can do myself)Muzzy
@Aegospotami What about NoSQL db like mongo or Elastic search ? Does pagination work same as RDBMS ?Whale
N
12

Yes, the query is executed over again when you run it with a different OFFSET.

Yes, this is inefficient. Don't do that if you have a need to paginate through a large result set.

I'd suggest doing the query once, with a large LIMIT — enough for 10 or 12 pages. Then save the result in a cache. When the user wants to advance through several pages, then your application can fetch the 10-12 pages you saved in the cache and display the page the user wants to see. That is usually much faster than running the SQL query for each page.

This works well if, like most users, your user reads only a few pages and then changes their query.


Re your comment:

By cache I mean something like Memcached or Redis. A high-speed, in-memory key/value store.

MySQL views don't store anything, they're more like a macro that runs a predefined query for you.

Oracle supports materialized views, so that might work better, but querying the view would have the overhead of interpreting an SQL query.

A simpler in-memory cache should be much faster.

Negrillo answered 13/6, 2018 at 21:50 Comment(3)
This makes sense. By "cache" you mean a view or something like that?Muzzy
what about a scenario where my data is highly dynamic?Praline
What do you mean by dynamic in this case? Updated frequently?Negrillo

© 2022 - 2024 — McMap. All rights reserved.