Why does MYSQL higher LIMIT offset slow the query down?

E

6

229

Scenario in short: A table with more than 16 million records [2GB in size]. The higher LIMIT offset with SELECT, the slower the query becomes, when using ORDER BY *primary_key*

So

SELECT * FROM large ORDER BY `id`  LIMIT 0, 30

takes far less than

SELECT * FROM large ORDER BY `id` LIMIT 10000, 30

That only orders 30 records and same eitherway. So it's not the overhead from ORDER BY.
Now when fetching the latest 30 rows it takes around 180 seconds. How can I optimize that simple query?

Erinerina answered 19/12, 2010 at 3:1 Comment(3)

NOTE: I'm the author. MySQL doesn't refer to the index (PRIMARY) in the above cases. see the below link by user "Quassnoi" for explanation. – Erinerina 27/12, 2010 at 15:18

possible duplicate of How can I speed up a MySQL query with a large offset in the LIMIT clause? – Loper 11/12, 2011 at 20:48

A related link: We need tool support for keyset pagination. If you’d like to know what happens inside the database when using offset or keyset pagination, have a look at those slides. – Lockwood 22/1, 2021 at 10:47

P

234

It's normal that higher offsets slow the query down, since the query needs to count off the first OFFSET + LIMIT records (and take only LIMIT of them). The higher is this value, the longer the query runs.

The query cannot go right to OFFSET because, first, the records can be of different length, and, second, there can be gaps from deleted records. It needs to check and count each record on its way.

Assuming that id is the primary key of a MyISAM table, or a unique non-primary key field on an InnoDB table, you can speed it up by using this trick:

SELECT  t.* 
FROM    (
        SELECT  id
        FROM    mytable
        ORDER BY
                id
        LIMIT 10000, 30
        ) q
JOIN    mytable t
ON      t.id = q.id

See this article:

MySQL ORDER BY / LIMIT performance: late row lookups

Pharmacopoeia answered 21/12, 2010 at 18:6 Comment(18)

MySQL "early row lookup" behavior was the answer why it's talking so long. By the trick you provided, only matched ids (by the index directly) are bound, saving unneeded row lookups of too many records. That did the trick, hooray! – Erinerina 27/12, 2010 at 15:8

Awesome ... are there any limitations, where this trick will not work? – Obaza 24/11, 2011 at 17:8

@harald: what exactly do you mean by "not work"? This is a pure performance improvement. If there is no index usable by ORDER BY or the index covers all fields you need, you don't need this workaround. – Pharmacopoeia 24/11, 2011 at 18:13

@Quassnoi: i don't know. i played around with this on some own tables with millions of rows where i had performance problems before and the solution you provided works like a charm for. i guess i have to delve a little deeper in what's going on here to fully understand this solution. thanks! – Obaza 24/11, 2011 at 20:58

From my test this solution has its limits. For a table with 16M rows, this method is getting slow once offset is around half the table, ie. LIMIT 7100000,100000 – Shum 7/8, 2012 at 14:38

@f055: the answer says "speed up", not "make instant". Have you read the very first sentence of the answer? – Pharmacopoeia 7/8, 2012 at 17:41

Is this approach applicable to PostgreSQL? How it compares to using a server-side cursor in terms of performance? I'm talking about data volume of around 2M records per table, for some tables. Inspiration for this comment, if you're curious, is my question on SO: #24265317. – Ihs 19/6, 2014 at 5:46

Is it possible to run something like this for InnoDB? – Latrice 27/5, 2015 at 15:17

@NeverEndingQueue: in InnoDB, tables are clustered on primary key, so it's useless if you order by the primary key. If your order by a secondary index key, then yes, it would make sense. – Pharmacopoeia 27/5, 2015 at 16:33

@Obaza - The trick will slow down once you have to scan so much of the index on id that the 'trick' becomes slow. The real 'fix' is to "remember where you left off". – Inger 6/7, 2015 at 21:5

So, does MySQL re-use position, it gained on previous call to LIMIT if calls go sequentially? – Taneshatang 13/6, 2016 at 11:38

@Dims: no it doesn't. – Pharmacopoeia 13/6, 2016 at 12:24

How to implement this with multiple WHERE statements, like: WHERE categories &> '{news}' AND title ILIKE ALL (ARRAY['%hello%'])? Seems like this "hack" doesn't work inside a sub-query, execution time as long as without sub-query with primary key column. Of course, all of those columns are indexed. This is needed for category listing and full-text search. – Caras 5/8, 2018 at 13:54

@Lanti: please post it as a separate question and don't forget to tag it with postgresql. This is a MySQL-specific answer. – Pharmacopoeia 5/8, 2018 at 14:33

So this does what you would do manually if you only fetched the columns covered by a single key first (here: IDs only) and then used a second query to fetch both the IDs and some other data for these records. – Malachy 18/1, 2019 at 0:12

Hi! I made a comparison (with a plot) here: https://mcmap.net/q/117595/-why-does-mysql-higher-limit-offset-slow-the-query-down :) – Lightness 1/3, 2020 at 7:23

I'm using Innodb's primary key id field as order by condition, but I'm still seeing a significant performance increase(780ms -> 13.7ms). AFAIK, Innodb is using cluster index, why is the performance still got increased? – Mylander 11/1, 2021 at 2:22

@ospider: post your setup and some sample data in a separate question and leave the link in a comment here – Pharmacopoeia 11/1, 2021 at 3:56

J

291

I had the exact same problem myself. Given the fact that you want to collect a large amount of this data and not a specific set of 30 you'll be probably running a loop and incrementing the offset by 30.

So what you can do instead is:

Hold the last id of a set of data(30) (e.g. lastId = 530)
Add the condition WHERE id > lastId limit 0,30

So you can always have a ZERO offset. You will be amazed by the performance improvement.

Journal answered 5/6, 2013 at 8:44 Comment(11)

Does this work if there are gaps? What if you don't have a single unique key (a composite key for example)? – Sobel 8/8, 2013 at 17:51

It may not be obvious to all that this only works if your result set is sorted by that key, in ascending order (for descending order the same idea works, but change > lastid to < lastid.) It doesn't matter if it's the primary key, or another field (or group of fields.) – Uneasy 16/9, 2013 at 14:14

Well done that man! A very simple solution that has solved my problem :-) – Termor 24/12, 2013 at 14:51

Just a note that limit/offset is often used in paginated results, and holding lastId is simply not possibly because the user can jump to any page, not always the next page. In other words, offset often needs to be calculated dynamically based on page and limit, instead of following a continuous pattern. – Interdisciplinary 28/12, 2013 at 14:23

If it is just pagination and sorting with id ONLY with a simple where clause if you want you can use this:

select id,name from large_table where id>=(SELECT id FROM large_table  where name like '%the%' limit 1 offset 2000001) and name like '%the%' limit 50;

It is fast.. – Homy 10/2, 2016 at 7:21

I talk at more length about "remembering where you left off" in mysql.rjweb.org/doc.php/pagination – Inger 24/1, 2017 at 23:14

Although completely unprofessional, yet I have to admit it: Love ya! – Erbes 3/4, 2018 at 11:19

What about there's no lastId ? – Tarragon 11/7, 2018 at 9:7

How do you remember your last position when the user visiting Page 563 from bookmark? Seems like there's no alternative for limit / offset in this case, which is sad. I have a 6 million row sample data and the pagination is pain, if you cannot implement unlimited scrolling. – Caras 12/7, 2018 at 19:49

man. you are a live saver. i have 5 mil data that need around 90 mins to process all with offset and limit now when i tried your answer. daamn its only need 9 mins to process Thankyou man. THANK YOU!! – Cheboksary 12/3, 2021 at 8:26

@Caras Let's assume that Page 563 begins at offset 563 * 30 = 16890, since in the OP's example 30 is the page size and assume page numbering starts from 0. Further assume that column id is unique and is indexed. Then execute select id from large order by id limit 16889, 1 to read the id of the last row of Page 562. This should be reasonably efficient since only the index is involved. Now you have the "lastId" to proceed with selecting the next page. – Spectator 9/4, 2022 at 22:24

P

234

It's normal that higher offsets slow the query down, since the query needs to count off the first OFFSET + LIMIT records (and take only LIMIT of them). The higher is this value, the longer the query runs.

The query cannot go right to OFFSET because, first, the records can be of different length, and, second, there can be gaps from deleted records. It needs to check and count each record on its way.

Assuming that id is the primary key of a MyISAM table, or a unique non-primary key field on an InnoDB table, you can speed it up by using this trick:

SELECT  t.* 
FROM    (
        SELECT  id
        FROM    mytable
        ORDER BY
                id
        LIMIT 10000, 30
        ) q
JOIN    mytable t
ON      t.id = q.id

See this article:

MySQL ORDER BY / LIMIT performance: late row lookups