Combine column to remove NA's yet prioritize specific replacements
Asked Answered
O

1

3

I'm learning to update column data using this previous post. However, is there a trick for specifying which column should provide the final updated value in case of a conflict. For example, I can combine columns of data as long as only one value exists per row:

data <- data.frame('a' = c('A','B','C','D','E'),
    'x' = c(NA,NA,3,NA,NA),
    'y' = c(1,2,NA,NA,NA),
    'z' = c(NA,NA,NA,4,5))
cbind.data.frame(data3[1], mycol=c(na.omit(c(t(data3[, -1])))))

How would I force the value to come from newVal in the following case?

data <- data.frame('a' = c('A','B','C','D','E','F'),
                   'x' = c(NA,NA,NA,3,NA,NA),
                   'y' = c(1,2,8,NA,NA,NA),
                   'z' = c(99,NA,4,NA,4,5))
Osy answered 30/11, 2016 at 1:17 Comment(0)
C
3

Use max.col and some matrix indexing (specifying which row/col combination to take):

cbind(1:nrow(data), max.col(!is.na(data[-1]), "last"))
#     [,1] [,2]
#[1,]    1    3
#[2,]    2    2
#[3,]    3    3
#[4,]    4    1
#[5,]    5    3
#[6,]    6    3

data[-1][cbind(1:nrow(data), max.col(!is.na(data[-1]), "last"))]
#[1] 99  2  4  3  4  5

cbind(data[1], result=data[-1][cbind(1:nrow(data), max.col(!is.na(data[-1]), "last"))])
#  a result
#1 A     99
#2 B      2
#3 C      4
#4 D      3
#5 E      4
#6 F      5

If you need a particular column to always be given precedence, make a temporary object with the columns in a particular order, and then process it:

tmp <- data[-1][c("z", setdiff(names(data[-1]), "z"))]
tmp[cbind(1:nrow(tmp), max.col(!is.na(tmp), "first"))]
#[1] 99  2  4  3  4  5
Cymene answered 30/11, 2016 at 1:21 Comment(6)
I updated my example to avoid the trivial case where we just look for max value in the last column; it could be greater or lesser than.Osy
@Osy - the code is not picking the maximum value, it's picking the last non-NA value going across the columns. It may not be the numeric maximum. Did you even try the code?Cymene
i see. thx. instead of the max column is there a way to specify the column hte value should come from?Osy
@Osy - you could make a temp dataset like data[-1][c("z", setdiff(names(data[-1]), "z"))] which puts the key column first, and then look for the "first" non-NA value.Cymene
ok... had to read it slowly to understand. why the downvote on your answer?Osy
@Osy - good to hear it works for you. Not sure about downvote... I had my other most recent question downvoted at the same time too. Probably just someone being a dill.Cymene

© 2022 - 2024 — McMap. All rights reserved.