MIN/MAX vs ORDER BY and LIMIT
Asked Answered
F

7

132

Out of the following queries, which method would you consider the better one? What are your reasons (code efficiency, better maintainability, less WTFery)...

SELECT MIN(`field`)
FROM `tbl`;

SELECT `field`
FROM `tbl`
ORDER BY `field`
LIMIT 1;
Freitag answered 9/1, 2009 at 1:25 Comment(0)
C
160

In the worst case, where you're looking at an unindexed field, using MIN() requires a single full pass of the table. Using SORT and LIMIT requires a filesort. If run against a large table, there would likely be a significant difference in percieved performance. As an anecdotal data point, MIN() took .36s while SORT and LIMIT took .84s against a 106,000 row table on my dev server.

If, however, you're looking at an indexed column, the difference is harder to notice (meaningless data point is 0.00s in both cases). Looking at the output of explain, however, it looks like MIN() is able to simply pluck the smallest value from the index ('Select tables optimized away' and 'NULL' rows) whereas the SORT and LIMIT still needs needs to do an ordered traversal of the index (106,000 rows). The actual performance impact is probably negligible.

It looks like MIN() is the way to go - it's faster in the worst case, indistinguishable in the best case, is standard SQL and most clearly expresses the value you're trying to get. The only case where it seems that using SORT and LIMIT would be desirable would be, as mson mentioned, where you're writing a general operation that finds the top or bottom N values from arbitrary columns and it's not worth writing out the special-case operation.

Coumas answered 9/1, 2009 at 1:51 Comment(5)
o(n) for one single pass vs 0(nlogn) for sortingStamp
@AbhishekIyer you are totally right, but I would add "in the worst case for unindexed field".Snell
That part about worst unindexed case is wrong. You always need a full scan, how else you know it's a min or max? It's not like you're scanning and the value screams: "Hey, you finally found me! I'm Jack, the max!".Kevyn
In a test with an indexed table with 470 million rows, both queries take 0.00 s. However, if we add to the queries a filter "WHERE field2=x", the query with LIMIT still takes 0.00 s and the query with MIN takes 0.21 s.Messene
This depends on your table and database setup. We just reduced a DB with >10M rows from multi-second to sub-second by pivoting from order by with limit to group by with max. The logic should be easy to comprehend. Iteration aside, you either need to retrieve n rows, or 1 row.Canadianism
F
16
SELECT MIN(`field`)
FROM `tbl`;

Simply because it is ANSI compatible. Limit 1 is particular to MySql as TOP is to SQL Server.

Forgery answered 9/1, 2009 at 1:28 Comment(2)
Most DBMSes have limit/offset or equivalent, and it is used in the majority of apps I have worked on (not as an alternative to MIN, but for other purposes such as pagination.)Characharabanc
@Characharabanc - I agree, but the questioner's example was comparing limit with min explicitly.Communitarian
D
13

As mson and Sean McSomething have pointed out, MIN is preferable.

One other reason where ORDER BY + LIMIT is useful is if you want to get the value of a different column than the MIN column.

Example:

SELECT some_other_field, field
FROM tbl
ORDER BY field
LIMIT 1
Derisive answered 16/10, 2013 at 18:40 Comment(1)
Is this preferable in this case? or using nested queries with max/min? (SELECT some_other_field, field FROM tbl WHERE field = ( SELECT MIN(field) FROM tbl ))Offertory
A
6

I think the answers depends on what you are doing.

If you have a 1 off query and the intent is as simple as you specified, select min(field) is preferable.

However, it is common to have these types of requirements change into - grab top n results, grab nth - mth results, etc.

I don't think it's too terrible an idea to commit to your chosen database. Changing dbs should not be made lightly and have to revise is the price you pay when you make this move.

Why limit yourself now, for pain you may or may not feel later on?

I do think it's good to stay ANSI as much as possible, but that's just a guideline...

Angers answered 9/1, 2009 at 1:43 Comment(0)
F
4

Given acceptable performance I would use the first one because it is semantically closer to the intent.
If the performance was an issue, (Most modern optimizers will probalbly optimize both to the same query plan, although you have to test to verify that) then of course I would use the faster one.

Fraga answered 9/1, 2009 at 1:29 Comment(0)
D
0

user650654 said that ORDER BY with LIMIT 1 useful when one need "to get the value of a different column than the MIN column". I think, in this case we still have better performance with two single passes using MIN instead of sorting (hoping this is optimized :()

SELECT some_other_field, field
FROM tbl
WHERE field=(SELECT MIN(field) FROM tbl)
Dentalium answered 29/3, 2021 at 13:53 Comment(2)
The question itself assumes that [field] is unique... Otherwise there might be multiple records with the same value of field, with different values for some_other_field. The query would return multiple rows. in That case, you would want to limit the output to one of those rows using some other criterion.Fraga
Oh, OK. One may add "limit 1" in addition. 😉Dentalium
L
0

I found max is very slow with AWS RDS MySQL 8.0.35

select id from MY_TABLE where MY_COLUMN is null order by id desc limit 1;

=> 0.15 sec

select max(id) from MY_TABLE where MY_COLUMN is null;

=> 3 min 52.99 sec

where MY_TABLE contains about 960,000 rows and type of MY_COLUMN is longtext

  • only the firs time, it becomes 0.14 sec if I run max again
Letishaletitia answered 25/11, 2023 at 13:52 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.