How to rank within groups in R?
Asked Answered
W

7

31

This is my dataframe:

  customer_name order_dates order_values
1          John  2010-11-01           15
2           Bob  2008-03-25           12
3          Alex  2009-11-15            5
4          John  2012-08-06           15
5          John  2015-05-07           20

Let's say I want to add an order variable that Ranks the highest order value, by name, by max order date, using the last order date at the tie breaker.

So, ultimately the data should look like this:

  customer_name order_dates order_values ranked_order_values_by_max_value_date
1          John  2010-11-01           15                               3
2           Bob  2008-03-25           12                               1
3          Alex  2009-11-15            5                               1
4          John  2012-08-06           15                               2
5          John  2015-05-07           20                               1

Where everyone's single order gets 1, and all subsequent orders are ranked based on the value, and the tie breaker is the last order date getting priority. In this example, John's 8/6/2012 order gets the #2 rank because it was placed after 11/1/2010. The 5/7/2015 order is 1 because it was the biggest. So, even if that order was placed 20 years ago, it should be the #1 Rank because it was John's highest order value.

Does anyone know how I can do this in R? Where I can Rank within a group of specified variables in a data frame?

Wellhead answered 6/8, 2015 at 14:59 Comment(5)
@akrun what about tie breaker for values?Cloistral
Here's the code to make the data frame in case that helps: customer_name <- c("John","Bob","Alex","John","John"); order_dates <- as.Date(c('2010-11-1','2008-3-25','2009-11-15','2012-8-6','2015-5-7')); order_values <- c(15,12,5,15,20); test_data <- data.frame(customer_name,order_dates,order_values);Wellhead
@SenorO The OP's example should be a bit more complex to test. Also, dense_rank from dplyr is one way for tie breakerAnnelleannemarie
@akun: tie breaker for values would be the order date. So John has two $15 orders, but the one that was placed first is ranked higher.Wellhead
May be setDT(df1)[, rnk := order(desc(order_values), desc(order_dates)), customer_name] using data.tableAnnelleannemarie
B
24

You can do this pretty cleanly with dplyr

library(dplyr)
df %>%
    group_by(customer_name) %>%
    mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))

Source: local data frame [5 x 4]
Groups: customer_name

  customer_name order_dates order_values my_ranks
1          John  2010-11-01           15        3
2           Bob  2008-03-25           12        1
3          Alex  2009-11-15            5        1
4          John  2012-08-06           15        2
5          John  2015-05-07           20        1
Blairblaire answered 6/8, 2015 at 15:8 Comment(1)
This is incorrect. The correct answer is provided by @T.Himmel.Knothole
V
38

The top rated answer (by cdeterman) is actually incorrect. The order function provides the location of the 1st, 2nd, 3rd, etc ranked values not the ranks of the values in their current order.

Let’s take a simple example where we want to rank, starting with the largest, grouping by customer name. I have included a manual ranking so we can check the values

    > df
       customer_name order_values manual_rank
    1           John            2           5
    2           John            5           2
    3           John            9           1
    4           John            1           6
    5           John            4           3
    6           John            3           4
    7           Lucy            4           4
    8           Lucy            9           1
    9           Lucy            6           3
    10          Lucy            2           6
    11          Lucy            8           2
    12          Lucy            3           5

If I run the code suggested by cdeterman I get the following incorrect ranks:

    > df %>%
    +   group_by(customer_name) %>%
    +   mutate(my_ranks = order(order_values, decreasing=TRUE))
    Source: local data frame [12 x 4]
    Groups: customer_name [2]

       customer_name order_values manual_rank my_ranks
              <fctr>        <dbl>       <dbl>    <int>
    1           John            2           5        3
    2           John            5           2        2
    3           John            9           1        5
    4           John            1           6        6
    5           John            4           3        1
    6           John            3           4        4
    7           Lucy            4           4        2
    8           Lucy            9           1        5
    9           Lucy            6           3        3
    10          Lucy            2           6        1
    11          Lucy            8           2        6
    12          Lucy            3           5        4

Order is used to re-order dataframes into decreasing or increasing order. What we actually want is to run the order function twice, with the second order function giving us the actual ranks we want.

    > df %>%
    +   group_by(customer_name) %>%
    +   mutate(good_ranks = order(order(order_values, decreasing=TRUE)))
    Source: local data frame [12 x 4]
    Groups: customer_name [2]

       customer_name order_values manual_rank good_ranks
              <fctr>        <dbl>       <dbl>      <int>
    1           John            2           5          5
    2           John            5           2          2
    3           John            9           1          1
    4           John            1           6          6
    5           John            4           3          3
    6           John            3           4          4
    7           Lucy            4           4          4
    8           Lucy            9           1          1
    9           Lucy            6           3          3
    10          Lucy            2           6          6
    11          Lucy            8           2          2
    12          Lucy            3           5          5
Vadavaden answered 27/4, 2017 at 19:15 Comment(1)
This worked great for me. I had to run detach("package:plyr", unload=TRUE) prior though so it would group properly. Thanks for this solution!Involucel
B
24

You can do this pretty cleanly with dplyr

library(dplyr)
df %>%
    group_by(customer_name) %>%
    mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))

Source: local data frame [5 x 4]
Groups: customer_name

  customer_name order_dates order_values my_ranks
1          John  2010-11-01           15        3
2           Bob  2008-03-25           12        1
3          Alex  2009-11-15            5        1
4          John  2012-08-06           15        2
5          John  2015-05-07           20        1
Blairblaire answered 6/8, 2015 at 15:8 Comment(1)
This is incorrect. The correct answer is provided by @T.Himmel.Knothole
R
8

This can be achieved with ave and rank. ave passes the proper groups to rank. The result from rank is reversed due to the requested order:

with(x, ave(as.numeric(order_dates), customer_name, FUN=function(x) rev(rank(x))))
## [1] 3 1 1 2 1
Reflection answered 6/8, 2015 at 15:13 Comment(0)
D
2
df %>% 
  group_by(customer_name) %>% 
  arrange(customer_name,desc(order_values)) %>% 
  mutate(rank2=rank(order_values))
Diabolism answered 7/6, 2019 at 17:25 Comment(0)
T
1

In base R you can do this with the slightly unwieldy

transform(df,rank=ave(1:nrow(df),customer_name,
  FUN=function(x) order(order_values[x],order_dates[x],decreasing=TRUE)))
  customer_name order_dates order_values rank
1          John  2010-11-01           15    3
2           Bob  2008-03-25           12    1
3          Alex  2009-11-15            5    1
4          John  2012-08-06           15    2
5          John  2015-05-07           20    1

where order is provided both the primary and tie-breaker values for each group.

Talishatalisman answered 6/8, 2015 at 15:17 Comment(0)
F
0

Similar to @t-himmel's answer, you can get the ranks with data.table.

dt[ , rnk := order(order(order_values, decreasing = TRUE)), customer_name ]
Fahlband answered 6/8, 2021 at 16:19 Comment(0)
T
0

Matthew Lundberg's solution is close, but it is not meaningful to do rev(rank(x)). Here is a fix for that.

x <- data.frame( customer_name = rep(c("John", "Lucy"), each=6), 
    order_values = c(2, 5, 9, 1, 4, 3, 4, 9, 6, 2, 8, 3),
    manual_rank = c(5, 2, 1, 6, 3, 4, 4, 1, 3, 6, 2, 5))

x$test <- with(x, ave(as.numeric(order_values), customer_name,
                         FUN=\(x) rank(max(x, na.rm=TRUE) - x)))

all(x$manual_rank == x$test)
#[1] TRUE
Talmud answered 29/12, 2023 at 22:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.