Finding the column number and value the of second highest value in a row
Asked Answered
C

2

8

I am trying to write some code which identifies the greatest two values for each row and provides their column number and value.

df = data.frame( car = c (2,1,1,1,0), bus = c (0,2,0,1,0),
                 walk = c (0,3,2,0,0), bike = c(0,4,0,0,1))

I've managed to get it to do this for the maximum value using the max and max.col functions.

df$max = max.col(df,ties.method="first")
df$val = apply(df[ ,1:4], 1, max)

As far as I know there are no equivalent functions for the second highest value so doing this has made things a little trickier. Using this code provides the second highest value but (importantly) not in situations with ties. Also it looks risky.

sec.fun <- function (x) {
  max( x[x!=max(x)] )
}

df$val2 <- apply(df[ ,1:4], 1, sec.fun)

Ideally the solution would not involve removing any original data and could be used to find the third, fourth... highest value but neither of these are essential requirements.

Conceptualism answered 24/4, 2012 at 11:21 Comment(0)
D
25

try this:

# a function that returns the position of n-th largest
maxn <- function(n) function(x) order(x, decreasing = TRUE)[n]

this is a closure, so you can use like this:

> # position of the largest
> apply(df, 1, maxn(1))
[1] 1 4 3 1 4
> # position of the 2nd largest
> apply(df, 1, maxn(2))
[1] 2 3 1 2 1
> 
> # value of the largest
> apply(df, 1, function(x)x[maxn(1)(x)])
[1] 2 4 2 1 1
> # value of the 2nd largest
> apply(df, 1, function(x)x[maxn(2)(x)])
[1] 0 3 1 1 0

Updated

Why using closure here?

One reason is that you can define a function such as:

max2 <- maxn(2)
max3 <- maxn(3)

then, use it

> apply(df, 1, max2)
[1] 2 3 1 2 1
> apply(df, 1, max3)
[1] 3 2 2 3 2

I'm not sure if the advantage is obvious, but I like this way, since this is more functional-ish way.

Desrochers answered 24/4, 2012 at 11:33 Comment(2)
OK, I haven't had coffee yet, but is there an advantage to your maxn over maxn<-function(x,n=1) order(x,decreasing=TRUE)[n] ?Discourse
Thanks, I've tried this and it seems to work really well. One note for others is that when adding these values to the existing data.frame the column range must be specified, as in the original example.Conceptualism
S
0

Here's a data.table solution to identify and record the max column, max value, 2nd largest column, and 2nd largest value for specified columns.

# Library
library(data.table)

# Data
set.seed(123)
df=data.table(V1=rnorm(10),V2=rnorm(10),V3=rnorm(10),V4=letters[1:10])

# MaxColumn
tmp=c('V1','V2','V3') # Search only in these columns
df[,MaxColumn:=apply(.SD,1,FUN=which.max),.SDcols=tmp]

# MaxValue
df[,MaxValue:=apply(.SD,1,FUN=max),.SDcols=tmp]

# Rank2Column (2nd largest)
df[,Rank2Column:=apply(.SD,1,function(x) which(rank(x)==(length(tmp)-1))),.SDcols=tmp]

# Rank2Value
df[,Rank2Value:=apply(.SD,1,function(x) x[which(rank(x)==(length(tmp)-1))]),.SDcols=tmp]
Stapler answered 1/5 at 18:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.