Closest value to a specific column in R
Asked Answered
R

4

14

I would like to find the closest value to column x3 below.

data=data.frame(x1=c(24,12,76),x2=c(15,30,20),x3=c(45,27,15))
data
  x1 x2 x3
1 24 15 45
2 12 30 27
3 76 20 15

So desired output will be

Closest_Value_to_x3
   24
   30
   20

Please help. Thank you

Rubyeruch answered 23/1, 2019 at 13:24 Comment(0)
F
12

Use max.col(-abs(data[, 3] - data[, -3])) to find the column positions of the closest values and use this result as part of a matrix to extract desired values from your data. The matrix is returned by cbind

col <- 3
data[, -col][cbind(1:nrow(data),
                   max.col(-abs(data[, col] - data[, -col])))]
#[1] 24 30 20
Foreplay answered 23/1, 2019 at 13:30 Comment(2)
Nice answer. Although I don't think the outside [, 1:2] subset is necessary since you've already done that subset inside the abs() call.Syncytium
@RichScriven Thanks for your comment. I guess I included it in case OP changes their mind and want's to find the closest values to, say, column 1, in which case we'd need the [, 2:3] subset.Foreplay
S
4

A tidyverse solution:

data %>%
  rowid_to_column() %>%
  gather(var, val, -c(x3, rowid)) %>%
  mutate(temp = x3 - val) %>%
  group_by(rowid) %>%
  filter(abs(temp) == min(abs(temp))) %>%
  ungroup() %>%
  select(val)

    val
  <dbl>
1    24
2    30
3    20

First, it adds a row ID. Second, it transforms the data from wide to long. Third, it calculates the difference between "x3" and the other variables. Finally, it groups by the row ID and keeps the rows where the absolute difference is the smallest.

Or:

data %>%
  rowid_to_column() %>%
  gather(var, val, -c(x3, rowid)) %>%
  mutate(temp = x3 - val) %>%
  group_by(rowid) %>%
  filter(abs(temp) == min(abs(temp))) %>%
  ungroup() %>%
  pull(val)

[1] 24 30 20

Or using an approach originally proposed by @markus (it assumes that your columns are named "x"):

data %>%
 mutate(temp = paste0("x", max.col(-abs(.[, -3] - .[, 3])))) %>%
 rowwise() %>%
 summarise(val = eval(as.symbol(temp)))

    val
  <dbl>
1   24.
2   30.
3   20.

First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest and combines it with "x". Then, it evaluates the combination of x and column index as a variable and returns the appropriate value.

Also borrowing the idea from @markus (not assuming that your columns are named "x"):

data %>%
 mutate(temp = max.col(-abs(.[, -3] - .[, 3]))) %>%
 rowwise %>%
 mutate(temp = names(.)[[temp]]) %>%
 summarise(val = eval(as.symbol(temp)))

First, it is assessing the column index of the variable where the absolute difference in regard to "x3" is the smallest. Second, it returns the column name based on the column index. Finally, it evaluates it as a variable and returns the appropriate value.

Or a variant where you can reference the "x3" variable by its name and not by column index (the basic idea still from @markus):

data %>%
 mutate(temp = max.col(-abs(.[, !grepl("x3", colnames(.))] - .[, grepl("x3", colnames(.))]))) %>% 
 rowwise %>%
 mutate(temp = names(.)[[temp]]) %>%
 summarise(val = eval(as.symbol(temp)))
Squeeze answered 23/1, 2019 at 14:0 Comment(8)
I like that I can always count on you for a tidyverse approach but sometimes they look so complex and intimidating. Great all the same!Spice
@Spice sometimes it gets really verbose, that is true. But that is also true that the tidyverse aprroaches in general are not the ones with the shortest code. Anyway, thank you for your compliment :)Squeeze
This is a tidyverse solution, not the only one, and not one I'd write. You can make the code a lot less verbose by following the natural logic of the other answer, no need to reshape the data.Wishful
@Konrad Rudolph I agree with you, however, I'm not presenting this post as the tidyverse solution. Also, I'm not saying that this particular problem cannot be solved using tidyverse with a lot less verbose code. I'm just saying that the tidyverse solutions are, in general, not the ones with the least verbose code.Squeeze
@Squeeze You’re completely right but just to clarify, I think in this case you’re torturing dplyr, and it confesses to anything. You could do a simpler single mutate: data %>% mutate(d = .[, -3][cbind(row_number(), max.col(- abs(.[, 3] - .[, -3])))]) — I’d be tempted to introduce a temporary column to hold the result of max.col but otherwise that’s it.Wishful
"I think in this case you’re torturing dplyr, and it confesses to anything", this could be on fortunes.Preparatory
@Konrad Rudolph the code in your post is a literal transcription of the code from markus. It is a single line and it does the job, yes, but, in my opinion, it is not representing the tidyverse mentality. Anyway, thank you for your remarks :)Squeeze
@Squeeze “It’s not representative of the tidyverse mentality” — Well, more precisely, it’s not representative of the dplyr mentality, because that package is specifically designed to handle tidy data (i.e. long data). And OP’s problem is specifically a matrix computation problem, which dplyr is intentionally not designed to handle. And that’s kinda my point.Wishful
M
3

Here is another approach using matrixStats

x <- as.matrix(data[,-3L])
y <- abs(x - .subset2(data, 3L))
x[matrixStats::rowMins(y) == y]
# [1] 24 30 20

Or in base using vapply

x <- as.matrix(data[,-3L])
y <- abs(x - .subset2(data, 3L))
vapply(1:nrow(data), 
       function(k) x[k,][which.min(y[k,])], 
       numeric(1))
# [1] 24 30 20
Mccourt answered 23/1, 2019 at 13:47 Comment(0)
U
2

Define a function closest_to_3 that operates on a vector and returns the value in the vector that's closest to the third member:

closest_to_3 <- function(v) v[-3][which.min(abs( v[-3]-v[3] ))]

(The idiom v[-3] deletes the 3rd member from v.) Then apply this function to each row of your data frame:

apply(data, 1, closest_to_3)
#[1] 24 30 20
Unbiased answered 23/1, 2019 at 20:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.