SQL RANK() versus ROW_NUMBER()
Asked Answered
M

9

326

I'm confused about the differences between these. Running the following SQL gets me two idential result sets. Can someone please explain the differences?

SELECT ID, [Description], RANK()       OVER(PARTITION BY StyleID ORDER BY ID) as 'Rank'      FROM SubStyle
SELECT ID, [Description], ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) as 'RowNumber' FROM SubStyle
Midriff answered 12/10, 2011 at 22:26 Comment(0)
C
295

ROW_NUMBER : Returns a unique number for each row starting with 1. For rows that have duplicate values,numbers are arbitarily assigned.

Rank : Assigns a unique number for each row starting with 1,except for rows that have duplicate values,in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.

Coh answered 12/10, 2011 at 22:38 Comment(0)
P
560

You will only see the difference if you have ties within a partition for a particular ordering value.

RANK and DENSE_RANK are deterministic in this case, all rows with the same value for both the ordering and partitioning columns will end up with an equal result, whereas ROW_NUMBER will arbitrarily (non deterministically) assign an incrementing result to the tied rows.

Example: (All rows have the same StyleID so are in the same partition and within that partition the first 3 rows are tied when ordered by ID)

WITH T(StyleID, ID)
     AS (SELECT 1,1 UNION ALL
         SELECT 1,1 UNION ALL
         SELECT 1,1 UNION ALL
         SELECT 1,2)
SELECT *,
       RANK() OVER(PARTITION BY StyleID ORDER BY ID)       AS [RANK],
       ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) AS [ROW_NUMBER],
       DENSE_RANK() OVER(PARTITION BY StyleID ORDER BY ID) AS [DENSE_RANK]
FROM   T  

Returns

StyleID     ID       RANK      ROW_NUMBER      DENSE_RANK
----------- -------- --------- --------------- ----------
1           1        1         1               1
1           1        1         2               1
1           1        1         3               1
1           2        4         4               2

You can see that for the three identical rows the ROW_NUMBER increments, the RANK value remains the same then it leaps to 4. DENSE_RANK also assigns the same rank to all three rows but then the next distinct value is assigned a value of 2.

Penetrant answered 12/10, 2011 at 22:29 Comment(3)
Thanks for a great example. Helped me realize I've wrongly been using RANK() function when ROW_NUMBER() would have been much more appropriate.Lawhorn
Perfect! Have been using ROW_NUMBER where I should have used DENSE_RANK.Patent
For fast searchers: pay attention to PARTITIONed column in this example. It has only one unique element! If you want "row number" column for the hole table you should use MAP functions (ROW_NUMBER for example) without partition. Because in case of adding partition on column with more than 1 unique element the internal scheduler (planner) will apply MAP functions independently inside each partition (the task will become REDUCE operation and your table will be splitted by keys).Faceless
C
295

ROW_NUMBER : Returns a unique number for each row starting with 1. For rows that have duplicate values,numbers are arbitarily assigned.

Rank : Assigns a unique number for each row starting with 1,except for rows that have duplicate values,in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.

Coh answered 12/10, 2011 at 22:38 Comment(0)
I
50

This article covers an interesting relationship between ROW_NUMBER() and DENSE_RANK() (the RANK() function is not treated specifically). When you need a generated ROW_NUMBER() on a SELECT DISTINCT statement, the ROW_NUMBER() will produce distinct values before they are removed by the DISTINCT keyword. E.g. this query

SELECT DISTINCT
  v, 
  ROW_NUMBER() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number

... might produce this result (DISTINCT has no effect):

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| a |          2 |
| a |          3 |
| b |          4 |
| c |          5 |
| c |          6 |
| d |          7 |
| e |          8 |
+---+------------+

Whereas this query:

SELECT DISTINCT
  v, 
  DENSE_RANK() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number

... produces what you probably want in this case:

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| b |          2 |
| c |          3 |
| d |          4 |
| e |          5 |
+---+------------+

Note that the ORDER BY clause of the DENSE_RANK() function will need all other columns from the SELECT DISTINCT clause to work properly.

The reason for this is that logically, window functions are calculated before DISTINCT is applied.

All three functions in comparison

Using PostgreSQL / Sybase / SQL standard syntax (WINDOW clause):

SELECT
  v,
  ROW_NUMBER() OVER (window) row_number,
  RANK()       OVER (window) rank,
  DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v

... you'll get:

+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a |          1 |    1 |          1 |
| a |          2 |    1 |          1 |
| a |          3 |    1 |          1 |
| b |          4 |    4 |          2 |
| c |          5 |    5 |          3 |
| c |          6 |    5 |          3 |
| d |          7 |    7 |          4 |
| e |          8 |    8 |          5 |
+---+------------+------+------------+
Incidentally answered 23/5, 2014 at 6:14 Comment(4)
Both ROW_NUMBER and DENSE_RANK produce values before distinct is applied. Actually all ranking function or any function produces results before DISTINCT is applied.Sporogonium
@ThanasisIoannidis: Absolutely. I've updated my answer with a link to a blog post, where I've explained the true order of SQL operationsIncidentally
DISTINCT aside, I don't understand how does the query know to apply DENSE_RANK() based on values in the V column without PARTITION BY? Does it use the column you ordered by?Ascomycete
@Lukas: Ranking functions assing a rank according to ordering, yes. I recommend reading a good tutorial, e.g. this one here: tapoueh.org/blog/2013/08/understanding-window-functionsIncidentally
U
26

Simple query without partition clause:

select 
    sal, 
    RANK() over(order by sal desc) as Rank,
    DENSE_RANK() over(order by sal desc) as DenseRank,
    ROW_NUMBER() over(order by sal desc) as RowNumber
from employee 

Output:

    --------|-------|-----------|----------
    sal     |Rank   |DenseRank  |RowNumber
    --------|-------|-----------|----------
    5000    |1      |1          |1
    3000    |2      |2          |2
    3000    |2      |2          |3
    2975    |4      |3          |4
    2850    |5      |4          |5
    --------|-------|-----------|----------
Unfair answered 27/4, 2020 at 9:38 Comment(1)
Simplest example so far. Thank you.Ladylike
N
4

Quite a bit:

The rank of a row is one plus the number of ranks that come before the row in question.

Row_number is the distinct rank of rows, without any gap in the ranking.

http://www.bidn.com/blogs/marcoadf/bidn-blog/379/ranking-functions-row_number-vs-rank-vs-dense_rank-vs-ntile

Neuromuscular answered 12/10, 2011 at 22:31 Comment(1)
Ah, I think this is what I was missing -> Row_number is the distinct rank of rows, without any gap in the ranking.Midriff
S
-1

I haven't done anything with rank, but I discovered this today with row_number().

select item, name, sold, row_number() over(partition by item order by sold) as row from table_name

This will result in some repeating row numbers since in my case each name holds all items. Each item will be ordered by how many were sold.

+--------+------+-----+----+
|glasses |store1|  30 | 1  |
|glasses |store2|  35 | 2  |
|glasses |store3|  40 | 3  |
|shoes   |store2|  10 | 1  |
|shoes   |store1|  20 | 2  |
|shoes   |store3|  22 | 3  |
+--------+------+-----+----+
Stilliform answered 22/6, 2015 at 23:18 Comment(0)
M
-1

Note, all these windowing functions return an integer-like value.

Often the database will choose a BIGINT datatype, and this take much more space than we need. And, we will rarely need a range from -9,223,372,036,854,775,808 to +9,223,372,036,854,775,807.

Cast the results as a BYTEINT, SMALLINT, or INTEGER.

These modern systems and hardware are so strong, so you may never see a meaningflul extra use of resources, but I think it's best-practice.

Mcgaha answered 7/10, 2022 at 13:57 Comment(0)
O
-2

Look this example.

CREATE TABLE [dbo].#TestTable(
    [id] [int] NOT NULL,
    [create_date] [date] NOT NULL,
    [info1] [varchar](50) NOT NULL,
    [info2] [varchar](50) NOT NULL,
)

Insert some data

INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/1/09', 'Blue', 'Green')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/2/09', 'Red', 'Yellow')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (1, '1/3/09', 'Orange', 'Purple')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (2, '1/1/09', 'Yellow', 'Blue')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (2, '1/5/09', 'Blue', 'Orange')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (3, '1/2/09', 'Green', 'Purple')
INSERT INTO dbo.#TestTable (id, create_date, info1, info2)
VALUES (3, '1/8/09', 'Red', 'Blue')

Repeat same Values for 1

INSERT INTO dbo.#TestTable (id, create_date, info1, info2) VALUES (1, '1/1/09', 'Blue', 'Green')

Look All

SELECT * FROM #TestTable

Look your results

SELECT Id,
    create_date,
    info1,
    info2,
    ROW_NUMBER() OVER (PARTITION BY Id ORDER BY create_date DESC) AS RowId,
    RANK() OVER(PARTITION BY Id ORDER BY create_date DESC)    AS [RANK]
FROM #TestTable

Need to understand the different

Overgrow answered 12/12, 2014 at 15:51 Comment(0)
A
-2

Also, pay attention to ORDER BY in PARTITION (Standard AdventureWorks db is used for example) when using RANK.

SELECT as1.SalesOrderID, as1.SalesOrderDetailID, RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderID ) ranknoequal , RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderDetailId ) ranknodiff FROM Sales.SalesOrderDetail as1 WHERE SalesOrderId = 43659 ORDER BY SalesOrderDetailId;

Gives result:

SalesOrderID SalesOrderDetailID rank_same_as_partition rank_salesorderdetailid
43659 1 1 1
43659 2 1 2
43659 3 1 3
43659 4 1 4
43659 5 1 5
43659 6 1 6
43659 7 1 7
43659 8 1 8
43659 9 1 9
43659 10 1 10
43659 11 1 11
43659 12 1 12

But if change order by to (use OrderQty :

SELECT as1.SalesOrderID, as1.OrderQty, RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.SalesOrderID ) ranknoequal , RANK() OVER (PARTITION BY as1.SalesOrderID ORDER BY as1.OrderQty ) rank_orderqty FROM Sales.SalesOrderDetail as1 WHERE SalesOrderId = 43659 ORDER BY OrderQty;

Gives:

SalesOrderID OrderQty rank_salesorderid rank_orderqty
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 1 1 1
43659 2 1 7
43659 2 1 7
43659 3 1 9
43659 3 1 9
43659 4 1 11
43659 6 1 12

Notice how the Rank changes when we use OrderQty (rightmost column second table) in ORDER BY and how it changes when we use SalesOrderDetailID (rightmost column first table) in ORDER BY.

Aerogram answered 26/12, 2014 at 15:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.