How much is performance improved when using LIMIT in a SQL sentence?

C

3

11

Let's suppose I have a table in my database with 1.000.000 records.

If I execute:

SELECT * FROM [Table] LIMIT 1000

Will this query take the same time as if I have that table with 1000 records and just do:

SELECT * FROM [Table]

?

I'm not looking for if it will take exactly the same time. I just want to know if the first one will take much more time to execute than the second one.

I said 1.000.000 records, but it could be 20.000.000. That was just an example.

Edit:
Of course that when using LIMIT and without using it in the same table, the query built using LIMIT should be executed faster, but I'm not asking that...

To make it generic:

Table1: X records
Table2: Y records

(X << Y)

What I want to compare is:

SELECT * FROM Table1

and

SELECT * FROM Table2 LIMIT X

Edit 2:
Here is why I'm asking this:

I have a database, with 5 tables and relationships between some of them. One of those tables will (I'm 100% sure) contain about 5.000.000 records. I'm using SQL Server CE 3.5, Entity Framework as the ORM and LINQ to SQL to make the queries.

I need to perform basically three kind of non-simple queries, and I was thinking about showing to the user a limit of records (just like lot of websites do). If the user wants to see more records, the option he/she has is to restrict more the search.

So, the question came up because I was thinking about doing this (limiting to X records per query) or if storing in the database only X results (the recent ones), which will require to do some deletions in the database, but I was just thinking...

So, that table could contain 5.000.000 records or more, and what I don't want is to show the user 1000 or so, and even like this, the query still be as slow as if it would be returning the 5.000.000 rows.

Chloroprene answered 19/4, 2011 at 4:39 Comment(10)

what does TAKE keyword do? I've never come across it. is it synonym of LIMIT? – Raker 19/4, 2011 at 4:43

@Raker Sorry. I was thinking about LINQ :/ I'll edit it :) – Chloroprene 19/4, 2011 at 4:44

LIMIT is MySQL only. Probably should tag it so. – Kwakiutl 19/4, 2011 at 4:49

@Stephen i believe LIMIT is also in PostgresSQL :) – Raker 19/4, 2011 at 4:53

@Oscar TOP is only SQL Server AFAIK, i think also MS SQL? EDIT: here is a good reference on it w3schools.com/Sql/sql_top.asp. Seems rownum is oracle – Raker 19/4, 2011 at 4:54

@wired00: You're correct, LIMIT is supported by PostgreSQL and TOP is TSQL/SQL Server only. FETCH FIRST x ROWS ONLY is now ANSI, but DB2 is the only one to implement IIRC – Molt 19/4, 2011 at 4:55

@Oscar FETCH FIST n ROWS is the standard. In oracle you use WHERE rownum < n, – Jacobo 19/4, 2011 at 4:56

@Ponies. I shudder whenever i think of DB2 ;) I have bad memories of that thing... – Raker 19/4, 2011 at 4:57

@OMG Ponies Postgres supports FETCH FIRST n rows: postgresql.org/docs/9.0/static/sql-select.html – Jacobo 19/4, 2011 at 4:57

Great comments. Ma bad. – Kwakiutl 19/4, 2011 at 5:0

K

3

Assuming both tables are equivalent in terms of index, row-sizing and other structures. Also assuming that you are running that simple SELECT statement. If you have an ORDER BY clause in your SQL statements, then obviously the larger table will be slower. I suppose you're not asking that.

If X = Y, then obviously they should run in similar speed, since the query engine will be going through the records in exactly the same order -- basically a table scan -- for this simple SELECT statement. There will be no difference in query plan.

If Y > X only by a little bit, then also similar speed.

However, if Y >> X (meaning Y has many many more rows than X), then the LIMIT version MAY be slower. Not because of query plan -- again should be the same -- but simply because that the internal structure of data layout may have several more levels. For example, if data is stored as leafs on a tree, there may be more tree levels, so it may take slightly more time to access the same number of pages.

In other words, 1000 rows may be stored in 1 tree level in 10 pages, say. 1000000 rows may be stored in 3-4 tree levels in 10000 pages. Even when taking only 10 pages from those 10000 pages, the storage engine still has to go through 3-4 tree levels, which may take slightly longer.

Now, if the storage engine stores data pages sequentially or as a linked list, say, then there will be no difference in execution speed.

Kwakiutl answered 19/4, 2011 at 4:59 Comment(0)

I

8

TAKE 1000 from a table of 1000000 records - will be 1000000/1000 (= 1000) times faster because it only needs to look at (and return) 1000/1000000 records. Since it does less, it is naturally faster.

The result will be pretty (pseudo-)random, since you haven't specified any order in which to TAKE. However, if you do introduce an order, then one of two below becomes true:

The ORDER BY clause follows an index - the above statement is still true.
The ORDER BY clause cannot use any index - it will be only marginally faster than without the TAKE, because
- it has to inspect ALL records, and sort by ORDER BY
- deliver only a subset (TAKE count)
- so it is not faster in the first step, but the 2nd step involves less IO/network than ALL records

If you TAKE 1000 records from a table of 1000 records, it will be equivalent (with little significant differences) to TAKE 1000 records from 1 billion, as long as you are following the case of (1) no order by, or (2) order by against an index

Iceman answered 19/4, 2011 at 4:41 Comment(3)

so it means that if the column A isn't indexed and I'm doing an ORDER BY in that column, querying the same database with a LIMIT X and without it will be pretty similar in terms of performance, because it will require to load all the data to sort it and then return the first X records? – Chloroprene 19/4, 2011 at 5:5

Faster to do what? If it's convey the result somewhere, then it's IO limited and the database pretty much doesn't matter. – Pagoda 19/4, 2011 at 5:9

@Oscar / If you use ORDER BY someunindexedcolumn LIMIT X (or TAKE or TOP or FETCH FIRST etc) it will need to drag up all the records, sort them before returning X records. I lie - if it is 1000 from 1 billion, the network traffic is significantly different. Anyway, you get the picture of the steps involved and how the X works. – Iceman 19/4, 2011 at 5:12

K

3

Assuming both tables are equivalent in terms of index, row-sizing and other structures. Also assuming that you are running that simple SELECT statement. If you have an ORDER BY clause in your SQL statements, then obviously the larger table will be slower. I suppose you're not asking that.

If X = Y, then obviously they should run in similar speed, since the query engine will be going through the records in exactly the same order -- basically a table scan -- for this simple SELECT statement. There will be no difference in query plan.

If Y > X only by a little bit, then also similar speed.

However, if Y >> X (meaning Y has many many more rows than X), then the LIMIT version MAY be slower. Not because of query plan -- again should be the same -- but simply because that the internal structure of data layout may have several more levels. For example, if data is stored as leafs on a tree, there may be more tree levels, so it may take slightly more time to access the same number of pages.

In other words, 1000 rows may be stored in 1 tree level in 10 pages, say. 1000000 rows may be stored in 3-4 tree levels in 10000 pages. Even when taking only 10 pages from those 10000 pages, the storage engine still has to go through 3-4 tree levels, which may take slightly longer.

Now, if the storage engine stores data pages sequentially or as a linked list, say, then there will be no difference in execution speed.

Kwakiutl answered 19/4, 2011 at 4:59 Comment(0)

P

1

It would be approximately linear, as long as you specify no fields, no ordering, and all the records. But that doesn't buy you much. It falls apart as soon as your query wants to do something useful.

This would be quite a bit more interesting if you intended to draw some useful conclusion and tell us about the way it would be used to make a design choice in some context.

Thanks for the clarification.

In my experience, real applications with real users seldom have interesting or useful queries that return entire million-row tables. Users want to know about their own activity, or a specific forum thread, etc. So unless yours is an unusual case, by the time you've really got their selection criteria in hand, you'll be talking about reasonable result sizes.

In any case, users wouldn't be able to do anything useful with many rows over several hundred, transporting them would take a long time, and they couldn't scroll through it in any reasonable way.

MySQL has the LIMIT and OFFSET (starting record #) modifiers primarlly for the exact purpose of creating chunks of a list for paging as you describe.

It's way counterproductive to start thinking about schema design and record purging until you've used up this and a bunch of other strategies. In this case don't solve problems you don't have yet. Several-million-row tables are not big, practically speaking, as long as they are correctly indexed.

Pagoda answered 19/4, 2011 at 5:5 Comment(0)

Recommended topics

Hot tags