Closest value to a specific column in R

Asked 23/1, 2019 at 13:24 Answered 23/1, 2019 at 20:6

I would like to find the closest value to column x3 below.

data=data.frame(x1=c(24,12,76),x2=c(15,30,20),x3=c(45,27,15))
data
  x1 x2 x3
1 24 15 45
2 12 30 27
3 76 20 15

So desired output will be

Closest_Value_to_x3
   24
   30
   20

Please help. Thank you

Rubyeruch answered 23/1, 2019 at 13:24 Comment(0)

Use max.col(-abs(data[, 3] - data[, -3])) to find the column positions of the closest values and use this result as part of a matrix to extract desired values from your data. The matrix is returned by cbind

col <- 3
data[, -col][cbind(1:nrow(data),
                   max.col(-abs(data[, col] - data[, -col])))]
#[1] 24 30 20

Foreplay answered 23/1, 2019 at 13:30 Comment(2)

Nice answer. Although I don't think the outside [, 1:2] subset is necessary since you've already done that subset inside the abs() call. – Syncytium 23/1, 2019 at 20:25

@RichScriven Thanks for your comment. I guess I included it in case OP changes their mind and want's to find the closest values to, say, column 1, in which case we'd need the [, 2:3] subset. – Foreplay 23/1, 2019 at 20:32

A tidyverse solution:

data %>%
  rowid_to_column() %>%
  gather(var, val, -c(x3, rowid)) %>%
  mutate(temp = x3 - val) %>%
  group_by(rowid) %>%
  filter(abs(temp) == min(abs(temp))) %>%
  ungroup() %>%
  select(val)

    val
  <dbl>
1    24
2    30
3    20

First, it adds a row ID. Second, it transforms the data from wide to long. Third, it calculates the difference between "x3" and the other variables. Finally, it groups by the row ID and keeps the rows where the absolute difference is the smallest.

Or:

data %>%
  rowid_to_column() %>%
  gather(var, val, -c(x3, rowid)) %>%
  mutate(temp = x3 - val) %>%
  group_by(rowid) %>%
  filter(abs(temp) == min(abs(temp))) %>%
  ungroup() %>%
  pull(val)

[1] 24 30 20

Or using an approach originally proposed by @markus (it assumes that your columns are named "x"):

data %>%
 mutate(temp = paste0("x", max.col(-abs(.[, -3] - .[, 3])))) %>%
 rowwise() %>%
 summarise(val = eval(as.symbol(temp)))

    val
  <dbl>
1   24.
2   30.
3   20.

First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest and combines it with "x". Then, it evaluates the combination of x and column index as a variable and returns the appropriate value.

Also borrowing the idea from @markus (not assuming that your columns are named "x"):

data %>%
 mutate(temp = max.col(-abs(.[, -3] - .[, 3]))) %>%
 rowwise %>%
 mutate(temp = names(.)[[temp]]) %>%
 summarise(val = eval(as.symbol(temp)))

First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest. Second, it returns the column name based on the column index. Finally, it evaluates it as a variable and returns the appropriate value.

Or a variant where you can reference the "x3" variable by its name and not by column index (the basic idea still from @markus):

data %>%
 mutate(temp = max.col(-abs(.[, !grepl("x3", colnames(.))] - .[, grepl("x3", colnames(.))]))) %>% 
 rowwise %>%
 mutate(temp = names(.)[[temp]]) %>%
 summarise(val = eval(as.symbol(temp)))

Squeeze answered 23/1, 2019 at 14:0 Comment(8)

I like that I can always count on you for a tidyverse approach but sometimes they look so complex and intimidating. Great all the same! – Spice 23/1, 2019 at 14:19

@Spice sometimes it gets really verbose, that is true. But that is also true that the tidyverse aprroaches in general are not the ones with the shortest code. Anyway, thank you for your compliment :) – Squeeze 23/1, 2019 at 14:30

This is a tidyverse solution, not the only one, and not one I'd write. You can make the code a lot less verbose by following the natural logic of the other answer, no need to reshape the data. – Wishful 23/1, 2019 at 17:19

@Konrad Rudolph I agree with you, however, I'm not presenting this post as the tidyverse solution. Also, I'm not saying that this particular problem cannot be solved using tidyverse with a lot less verbose code. I'm just saying that the tidyverse solutions are, in general, not the ones with the least verbose code. – Squeeze 23/1, 2019 at 20:49

@Squeeze You’re completely right but just to clarify, I think in this case you’re torturing dplyr, and it confesses to anything. You could do a simpler single mutate: data %>% mutate(d = .[, -3][cbind(row_number(), max.col(- abs(.[, 3] - .[, -3])))]) — I’d be tempted to introduce a temporary column to hold the result of max.col but otherwise that’s it. – Wishful 24/1, 2019 at 9:57

"I think in this case you’re torturing dplyr, and it confesses to anything", this could be on fortunes. – Preparatory 24/1, 2019 at 10:56

@Konrad Rudolph the code in your post is a literal transcription of the code from markus. It is a single line and it does the job, yes, but, in my opinion, it is not representing the tidyverse mentality. Anyway, thank you for your remarks :) – Squeeze 24/1, 2019 at 17:3

@Squeeze “It’s not representative of the tidyverse mentality” — Well, more precisely, it’s not representative of the dplyr mentality, because that package is specifically designed to handle tidy data (i.e. long data). And OP’s problem is specifically a matrix computation problem, which dplyr is intentionally not designed to handle. And that’s kinda my point. – Wishful 24/1, 2019 at 17:37

Here is another approach using matrixStats

x <- as.matrix(data[,-3L])
y <- abs(x - .subset2(data, 3L))
x[matrixStats::rowMins(y) == y]
# [1] 24 30 20

Or in base using vapply

x <- as.matrix(data[,-3L])
y <- abs(x - .subset2(data, 3L))
vapply(1:nrow(data), 
       function(k) x[k,][which.min(y[k,])], 
       numeric(1))
# [1] 24 30 20

Mccourt answered 23/1, 2019 at 13:47 Comment(0)

Define a function closest_to_3 that operates on a vector and returns the value in the vector that's closest to the third member:

closest_to_3 <- function(v) v[-3][which.min(abs( v[-3]-v[3] ))]

(The idiom v[-3] deletes the 3rd member from v.) Then apply this function to each row of your data frame:

apply(data, 1, closest_to_3)
#[1] 24 30 20

Unbiased answered 23/1, 2019 at 20:6 Comment(0)

Recommended topics

Hot tags