R dplyr get name of which.min rowwise
Asked Answered
T

2

5

I just answered this question with a data.table approach, and was working hard to provide a dplyr equivalent, but failed misserably.

There's a simple data frame with numerical values in its columns. I want to get the name of the column with the minimum value using dplyr.

Please notice that I'm aware of other methods to solve this problem: at the moment my only interest is in the dplyr approach

id <- c(1,2,3,4,5,6,7,8,9,10)
x1 <- c(2,4,5,3,6,4,3,6,7,7)
x2 <- c(0,1,2,6,7,6,0,8,2,2)
x3 <- c(5,3,4,5,8,3,4,2,5,6)

DF <- data.frame(id, x1,x2,x3)

I tried several variants of:

DF %>% select(2:4) %>% rowwise() %>% mutate(y = function(x) names(x)[which.min(x)])

DF %>% select(2:4) %>% rowwise() %>% mutate(y = apply(x1:x3, 1, which.min(x1:x3))

but haven't found a way to do this the dplyr-way. I'll appreciate any hint!

Expected output:

DF
##     id x1 x2 x3  y
##  1:  1  2  0  5 x2
##  2:  2  4  1  3 x2
##  3:  3  5  2  4 x2
##  4:  4  3  6  5 x1
##  5:  5  6  7  8 x1
##  6:  6  4  6  3 x3
##  7:  7  3  0  4 x2
##  8:  8  6  8  2 x3
##  9:  9  7  2  5 x2
## 10: 10  7  2  6 x2
Tessitura answered 12/5, 2016 at 20:29 Comment(1)
I don't know if it's the dplyr-way, but something like DF %>% rowwise %>% mutate(y = names(.)[2:4][which.min(c(x1, x2, x3))]) seems to work.Barina
S
4

This seems kind of (well, actually very) clunky but...

DF %>% 
  mutate(y = apply(.[,2:4], 1, function(x) names(x)[which.min(x)]))
   id x1 x2 x3  y
1   1  2  0  5 x2
2   2  4  1  3 x2
3   3  5  2  4 x2
4   4  3  6  5 x1
5   5  6  7  8 x1
6   6  4  6  3 x3
7   7  3  0  4 x2
8   8  6  8  2 x3
9   9  7  2  5 x2
10 10  7  2  6 x2
Sev answered 12/5, 2016 at 20:51 Comment(2)
system.time for @eipi10's answer was about 17.5 secs for a 1e6 rows data frame, while the answer provided by @Barina in the comments took about 94.1 secs. Thanks to both!Tessitura
I try to stay away from rowwise because it seems to be very slow in general.Sev
C
4

Here's another approach.. I didn't run any benchmark but my guess is that it would perform better than using apply or rowwise.

DF %>% 
  mutate(y = names(.)[max.col(.[2:4]*-1)+1L])
#    id x1 x2 x3  y
# 1   1  2  0  5 x2
# 2   2  4  1  3 x2
# 3   3  5  2  4 x2
# 4   4  3  6  5 x1
# 5   5  6  7  8 x1
# 6   6  4  6  3 x3
# 7   7  3  0  4 x2
# 8   8  6  8  2 x3
# 9   9  7  2  5 x2
# 10 10  7  2  6 x2

The *-1 is because we want the min instead of the max column and +1 because we only look at columns 2:4 but need the correct column names in return.

Chamberlin answered 12/5, 2016 at 21:20 Comment(1)
I just timed yours and mine on a 1 million row data frame and your code ran in about 1/12th the time (median .33 sec vs. 4.0 sec). Definitely a much faster approach.Sev

© 2022 - 2024 — McMap. All rights reserved.