Find minimum value in an array that is larger than another column in R

Asked 11/6, 2021 at 16:26 Answered 12/6, 2021 at 10:39

I need to find the minimum values of three columns that are bigger than the values in another column. Say these five individuals entered a hospital in different months of the year, and they suffered several heart attacks before and after hospitalization. I need the first heart attack after hospitalization.


id<-c(100,105,108,200,205)
hosp<-c(3,5,2,6,2)
attack1<-c(1,6,3,4,1)
attack2<-c(4,7,9,10,NA)
attack3<-c(5,10,NA,NA,NA)
out<-c(7,12,11,12,9)

data <- data.frame(id,hosp,attack1,attack2,attack3,out)

   id hosp attack1 attack2 attack3 out
1 100    3       1       4       5   7
2 105    5       6       7      10  12
3 108    2       3       9      NA  11
4 200    6       4      10      NA  12
5 205    2       1      NA      NA   9

So the data should end up looking something like

   id hosp attack1 attack2 attack3 out afterh
1 100    3       1       4       5   7      4
2 105    5       6       7      10  12      6
3 108    2       3       9      NA  11      3
4 200    6       4      10      NA  12     10
5 205    2       1      NA      NA   9     NA

This is my attempt which did not work:

min_f<-function(a){
  x<-min(a[a>hosp])
}

data %>% mutate_if(vars(attack1,attack2,attack3),min_f())

Best answered 11/6, 2021 at 16:26 Comment(0)

You can use the following solution.

Here c(...) refers to all variables in each row of your data set and I chose only those that starts with attack
Then I chose only those values that are greater than the corresponding value of hosp in each row and since you were looking for the first one that is greater than the value of hosp I used first function to extract that
..2 also refers to the value of the second variable hosp in each row

library(dplyr)
library(purrr)

data %>%
  mutate(afterh = pmap_dbl(., ~ {x <- c(...)[3:5]; 
  first(sort(x[x > ..2]))}))

   id hosp attack1 attack2 attack3 out afterh
1 100    3       1       4       5   7      4
2 105    5       6       7      10  12      6
3 108    2       3       9      NA  11      3
4 200    6       4      10      NA  12     10
5 205    2       1      NA      NA   9     NA

As an alternative as mentioned by dear Mr. @Greg in a very large data set, we can use min function in place of first(sort)) combination to ensure a faster evaluation time of the following solution. In case there is no value greater than hosp like in the last row min function would return Inf so I made sure that it would return the value 0 instead you can change it with the value you prefer:

data %>%
  mutate(afterh = pmap_dbl(., ~ {x <- c(...)[3:5];
  out <- min(x[x > ..2], na.rm = TRUE);
  if(!is.finite(out)) 0 else out}))

   id hosp attack1 attack2 attack3 out afterh
1 100    3       1       4       5   7      4
2 105    5       6       7      10  12      6
3 108    2       3       9      NA  11      3
4 200    6       4      10      NA  12     10
5 205    2       1      NA      NA   9      0

Formless answered 11/6, 2021 at 16:46 Comment(4)

Do note that if the columns are out of order (attack3 | attack1 | attack2), we cannot rely on first() to find the smallest value greater than hosp. – Barbershop 11/6, 2021 at 17:2

Thank you very much dear @Greg, you have pointed to a very subtle point. I made a slight modification. – Formless 11/6, 2021 at 17:5

My pleasure! Though I do wonder if first ∘ sort would be as computationally efficient — on a lot of columns — as min (with something to ensure NA instead of Inf for an empty vector) – Barbershop 11/6, 2021 at 17:18

I made a slight modifications to my code. It was also a very good point, Thank you very much indeed. – Formless 11/6, 2021 at 17:34

data %>% 
  # Nest attack columns
  nest(attacks = starts_with('attack')) %>% 
  # Only one row at a time
  rowwise() %>% 
  # Find first instance for each row
  mutate(afterh = first(attacks[attacks > hosp])) %>% 
  # Unnest attacks
  unnest(attacks)

Near answered 11/6, 2021 at 16:49 Comment(3)

+1 Cleanest and most extensible solution so far. The nesting of all attack* columns was a prudent touch. Do note that if the columns are out of order, however, we cannot rely on first() to find the smallest value greater than hosp. – Barbershop 11/6, 2021 at 16:57

good point. Didn't consider it in my quick-and-dirty solution. – Near 11/6, 2021 at 17:7

I wouldn't consider it "dirty", per se. It keeps things pretty clean by leveraging dplyr, and it catches any and all attack* columns. – Barbershop 11/6, 2021 at 17:10

Nearly similar answer but using dplyr only

library(dplyr, warn.conflicts = F)

data %>% rowwise() %>%
  mutate(afterh = {xx <- select(cur_data(), starts_with('attack')); first(xx[xx > hosp])})

#> # A tibble: 5 x 7
#> # Rowwise: 
#>      id  hosp attack1 attack2 attack3   out afterh
#>   <dbl> <dbl>   <dbl>   <dbl>   <dbl> <dbl>  <dbl>
#> 1   100     3       1       4       5     7      4
#> 2   105     5       6       7      10    12      6
#> 3   108     2       3       9      NA    11      3
#> 4   200     6       4      10      NA    12     10
#> 5   205     2       1      NA      NA     9     NA

^{Created on 2021-06-12 by the reprex package (v2.0.0)}

Eurydice answered 12/6, 2021 at 10:39 Comment(0)

Recommended topics

Hot tags