Select last non-NA value in a row, by row

G

4

11

I have a data frame where each row is a vector of values of varying lengths. I would like to create a vector of the last true value in each row.

Here is an example data frame:

df <- read.table(tc <- textConnection("
   var1    var2    var3    var4
     1       2       NA      NA
     4       4       NA      6
     2       NA      3       NA                
     4       4       4       4              
     1       NA      NA      NA"), header = TRUE); close(tc)

The vector of values I want would therefore be c(2,6,3,4,1).

I just can't figure out how to get R to identify the last value.

Any help is appreciated!

Gastrostomy answered 23/9, 2011 at 17:21 Comment(1)

+1 for creating reproducible data – Gallery 23/9, 2011 at 17:31

G

19

Do this by combining three things:

Identify NA values with is.na
Find the last value in a vector with tail
Use apply to apply this function to each row in the data.frame

The code:

lastValue <- function(x)   tail(x[!is.na(x)], 1)

apply(df, 1, lastValue)
[1] 2 6 3 4 1

Gallery answered 23/9, 2011 at 17:28 Comment(4)

Sweet! I knew there must be a function for this: tail. Many thanks--marked as answered! – Gastrostomy 23/9, 2011 at 17:34

or, more elegant apply(df, 1, function(x) { tail(x[!is.na(x)], 1) }) – Brina 23/9, 2011 at 18:59

is that more elegant or just more compact? I think I like Andrie's solution better (it's too bad tail doesn't have an na.rm argument, then you could just do apply(df,1,tail,n=1,na.rm=TRUE) ... – Abroad 23/9, 2011 at 19:38

That doesn't seem more elegant to me. It's fewer lines but otherwise the exact same thing. I should think elegance would refer to a better algorithm, better expression of the existing one, or some function that accomplishes the task more simply. One liners like that tend to look cluttered and confusing, especially to novices... remembering back to being a novice. – Romeo 23/9, 2011 at 19:41

M

2

Here's an answer using matrix subsetting:

df[cbind( 1:nrow(df), max.col(!is.na(df),"last") )]

This max.col call will select the position of the last non-NA value in each row (or select the first position if they are all NA).

Magalymagan answered 24/6, 2015 at 5:9 Comment(0)

A

0

Here's another version that removes all infinities, NA, and NaN's before taking the first element of the reversed input:

apply(df, 1, function(x) rev(x[is.finite(x)])[1] )
# [1] 2 6 3 4 1

Alithia answered 23/9, 2011 at 19:50 Comment(0)

D

0

A dplyr alternative is to use coalesce and reverse the order of the selected columns:

library(dplyr)
df |> 
  mutate(var5 = coalesce(var4, var3, var2, var1))

#   var1 var2 var3 var4 var5
# 1    1    2   NA   NA    2
# 2    4    4   NA    6    6
# 3    2   NA    3   NA    3
# 4    4    4    4    4    4
# 5    1   NA   NA   NA    1

To make use of tidyselection, one can create an auxiliary function coacross to use coalesce with across, and use rev to reverse the order of the names:

coacross <- function(...) {
  coalesce(!!!across(...))
}

df |> 
  mutate(var5 = coacross(rev(everything())))

Disadvantaged answered 8/11, 2023 at 13:58 Comment(0)

Recommended topics

Hot tags