ROW_NUMBER() in MySQL

R

26

335

Is there a nice way in MySQL to replicate the SQL Server function ROW_NUMBER()?

For example:

SELECT 
    col1, col2, 
    ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col3 DESC) AS intRow
FROM Table1

Then I could, for example, add a condition to limit intRow to 1 to get a single row with the highest col3 for each (col1, col2) pair.

Rubio answered 12/12, 2009 at 23:58 Comment(3)

for a simple mysql row number function, check out datamakessense.com/mysql-rownum-row-number-function – Alga 20/10, 2014 at 16:12

MySql 8 now has ROW_NUMBER() and RANK(), see answer far below – Lancewood 14/7, 2019 at 13:50

@JimDavis Yes, that would be https://mcmap.net/q/98260/-row_number-in-mysql. Doc: dev.mysql.com/doc/refman/8.0/en/… – Cardona 15/5, 2020 at 9:50

Z

123

I want the row with the single highest col3 for each (col1, col2) pair.

That's a groupwise maximum, one of the most commonly-asked SQL questions (since it seems like it should be easy, but actually it kind of isn't).

I often plump for a null-self-join:

SELECT t0.col3
FROM table AS t0
LEFT JOIN table AS t1 ON t0.col1=t1.col1 AND t0.col2=t1.col2 AND t1.col3>t0.col3
WHERE t1.col1 IS NULL;

“Get the rows in the table for which no other row with matching col1,col2 has a higher col3.” (You will notice this and most other groupwise-maximum solutions will return multiple rows if more than one row has the same col1,col2,col3. If that's a problem you may need some post-processing.)

Zindman answered 13/12, 2009 at 0:14 Comment(14)

But what if there are two maximal values of col3 for a (col1, col2) pair? You'd end up with two rows. – Rubio 13/12, 2009 at 0:16

@Paul: yes! Just added a note about that in the answer a tic ago. You can usually easily drop unwanted extra rows in the application layer afterwards on some random basis, but if you have a lot of rows all with the same col3 it can be problematic. – Zindman 13/12, 2009 at 0:18

In t-sql I tend to need this as a sub-query as part of a much larger query, so post-processing isn't really an option. Also...what if you wanted the rows with the top n highest rows values of col3? With my t-sql example, you can add the constraint of intRow <= n, but this would be very hard with a self-join. – Rubio 13/12, 2009 at 0:21

If you took “with the single highest col3” literally you could make it return no rows instead of 2 in this case by using >= instead of >. But that's unlikely to be what you want! Another option in MySQL is to finish with GROUP BY col1, col2 without using an aggregate expression for col3; MySQL will pick a row at random. However this is invalid in ANSI SQL and generally considered really bad practice. – Zindman 13/12, 2009 at 0:22

For top N rows you have to add more joins or subqueries for each N, which soon gets unwieldy. Unfortunately LIMIT does not work in subqueries and there's no other arbitrary-selection-order or general windowsing function. – Zindman 13/12, 2009 at 0:24

Thanks, yes that makes sense. In the case of multiple maxima it certainly will have to be an arbitrary row, so the GROUP BY seems logical. The extra joins or subqueries sound a bit dubious though, especially if n is variable. The choice of preferred answer is a toss-up between this and OMG Ponies', as they both will replicate the functionality I need, but in a somewhat hard-to-read, slightly hacky way. – Rubio 13/12, 2009 at 0:44

@bobince: There's an easy solution to get the top N rows. See #1443027 – Hassiehassin 13/12, 2009 at 1:15

@Bill Karwin: That's a nice solution. Although in this case, the column we're sorting upon isn't necessarily unique so we may get more than n values. – Rubio 13/12, 2009 at 1:42

@Bill: nifty! What's the performance like on this sort of query, generally? Seeing heavy lifting in HAVING always makes me nervous. :-) – Zindman 13/12, 2009 at 2:24

bobince, the solution became rather popular here on SO, but I have a question. The solution is basically the same as if someone would try to find the largest id with the following query: SELECT t1.id FROM test t1 LEFT JOIN test t2 ON t1.id>t2.id WHERE t2.id IS NULL; Does not it require n*n/2 + n/2 IS NULL comparisons to find the single row? Do there happen any optimizations I do not see? I tried to ask the similar question to Bill in another thread but he seems to have ignored it. – Vanadium 10/1, 2012 at 13:16

@Rubio - To address the case where multiple rows exist that match the max per group and you wish to grab just one, you can always add the primary key in the ON clause logic to break the tie... SELECT t0.col3 FROM table AS t0 LEFT JOIN table AS t1 ON t0.col1 = t1.col1 AND t0.col2 = t1.col2 AND (t1.col3, t1.pk) > (t0.col3, t0.pk) WHERE t1.col1 IS NULL ; – Flavius 19/11, 2012 at 0:47

This would be more readable as

SELECT t0.col3 FROM table AS t0 WHERE NOT EXISTS (select 1 from table AS t1 ON t0.col1=t1.col1 AND t0.col2=t1.col2 AND t1.col3>t0.col3)

– Mouthwash 22/5, 2017 at 0:40

@wrschneider: It would be more readable, but at the time this answer was written, likely much slower. Subquery support was a relative latecomer to MySQL and initially performed poorly. I would hope today both queries would be pretty optimal, but I can't say I've been keeping track of developments... – Zindman 22/5, 2017 at 22:2

@JonArmstrong-Xgc, btw if one had a multi-criteria sorting with different sorting order like ORDER BY col1 ASC, col2 ASC, pk DESC etc AND one of the sorting orders (either ASC or DESC had only numeric criterion like int or float), then one may simply add a minus sign before the numeric criterion of the opposite sorting order, e.g. (t1.col3, -t1.pk) > (t0.col3, -t0.pk), otherwise have to manually specify: t1.col3 > t0.col3 OR t1.col3 = t0.col3 AND STRCMP(t1.surname, t0.surname) < 0 – Physiotherapy 30/4, 2019 at 16:39

P

249

There is no ranking functionality in MySQL 5.7 or below. (This is supported in MySQL v8.0+, see @LukaszSzozda's answer)

The closest you can get is to use a variable:

SELECT t.*, 
       @rownum := @rownum + 1 AS rank
  FROM YOUR_TABLE t, 
       (SELECT @rownum := 0) r

so how would that work in my case? I'd need two variables, one for each of col1 and col2? Col2 would need resetting somehow when col1 changed..?

Yes. If it were Oracle, you could use the LEAD function to peak at the next value. Thankfully, Quassnoi covers the logic for what you need to implement in MySQL.

Pabulum answered 13/12, 2009 at 0:5 Comment(9)

Hmm....so how would that work in my case? I'd need two variables, one for each of col1 and col2? Col2 would need resetting somehow when col1 changed..? – Rubio 13/12, 2009 at 0:7

Assigning to and reading from user-defined variables in the same statement is not reliable. this is documented here: dev.mysql.com/doc/refman/5.0/en/user-variables.html: "As a general rule, you should never assign a value to a user variable and read the value within the same statement. You might get the results you expect, but this is not guaranteed. The order of evaluation for expressions involving user variables is undefined and may change based on the elements contained within a given statement." – Bireme 11/1, 2010 at 13:51

@Roland: I've only tested on small datasets, haven't had any issue. Too bad MySQL has yet to address the functionality - the request has been in since 2008 – Pabulum 11/1, 2010 at 16:39

A nice example ishere: artfulsoftware.com/infotree/queries.php?&bw=1440#104 – Spill 8/11, 2010 at 3:50

According to my experience if you use INNER JOINs in your query, use ",(SELECT @rownum := 0) r" statement after INNER JOINs. – Squabble 8/10, 2013 at 13:18

This seems to be undefined behavior as Roland notes. e.g. this gives totally incorrect results for a table I tried:

SELECT @row_num:=@row_num+1 AS row_number, t.id FROM (SELECT * FROM table1 WHERE col = 264 ORDER BY id) t, (SELECT @row_num:=0) var;

– Interregnum 26/4, 2017 at 16:33

Is this work on mysql? I got syntax error when I run it ... – Armillary 22/9, 2020 at 5:36

For me this stopped working in MySQL 8.0.22. – Bluenose 3/3, 2021 at 8:27

For MySQL 8+, use the built-in row_number() solution instead of this one: https://mcmap.net/q/98260/-row_number-in-mysql/… – Personification 25/8, 2022 at 14:2

Z

123

I want the row with the single highest col3 for each (col1, col2) pair.

That's a groupwise maximum, one of the most commonly-asked SQL questions (since it seems like it should be easy, but actually it kind of isn't).

I often plump for a null-self-join:

SELECT t0.col3
FROM table AS t0
LEFT JOIN table AS t1 ON t0.col1=t1.col1 AND t0.col2=t1.col2 AND t1.col3>t0.col3
WHERE t1.col1 IS NULL;