Subtract columns in R data frame but keep values of var1 or var2 when the other is NA
Asked Answered
A

3

0

I wanted to subtract one column from the other in R and this turned out more complicated than I thought.

Suppose this is my data (columns a and b) and column c is what I want, namely a - b but keeping a when b==NA and vice versa:

   a    b    c
1  2    1    1
2  2   NA    2
3 NA    3    3
4 NA   NA   NA

Now I tried different things but most of the time it returned NA when at least one column was NA. For example:

matrixStats::rowDiffs(data, na.rm=T) # only works for matrix-format, and returns NA's

dat$c <- dat$a - dat$b + ifelse(is.na(dat$b),dat$a,0) + ifelse(is.na(dat$a),dat$b,0) # seems like a desparately basic solution, but not even this does the trick as it also returns NA's

apply(dat[,(1:2)], MARGIN = 1,FUN = diff, na.rm=T) # returns NA's

dat$b<-dat$b*(-1)
dat$c<-rowSums(dat,na.rm=T) # this kind of works but it's a really ugly workaround

Also, if you can think of a dplyr solution, please share your knowledge. I didn't even know what to try.

Will delete this question if you think it's a duplicate of an existing one, though none of the existing threads were particularly helpful.

Atahualpa answered 2/9, 2018 at 14:37 Comment(1)
What are the types of the a and b columns?Para
I
1

Try this (Base R Solution):

If df$b is NA then simply take the value of df$a else if df$a is NA then simply take the value of df$b else do df$a-df$b

df$c=ifelse(is.na(df$b),df$a,ifelse(is.na(df$a),df$b,df$a-df$b))

Output:

df
   a  b  c
1  2  1  1
2  2 NA  2
3 NA  3  3
4 NA NA NA
Irreverence answered 2/9, 2018 at 14:43 Comment(0)
P
2

You may try using the coalesce function from the dplyr package:

dat <- data.frame(a=c(2, 2, NA, NA), b=c(1, NA, 3, NA))
dat$c <- coalesce(dat$a - coalesce(dat$b, 0), dat$b)
dat$c

   a  b  c
1  2  1  1
2  2 NA  2
3 NA  3  3
4 NA NA NA

The idea here is to take a minus b, or a alone if b be NA. If that entire expression is still NA, then it implies that a is also NA, in which case we take b.

Para answered 2/9, 2018 at 14:44 Comment(2)
Thank you, this works fine... although when I first tried it an error message popped up telling me that argument 2 Argument 2 must be type integer` (which I thought it was already). Tried again using `as.integer(dat$b) and it worked.Atahualpa
Sorry, just noticed that the outcome of this is strange: it doesn't return a when b==NA and vice versa...Atahualpa
I
1

Try this (Base R Solution):

If df$b is NA then simply take the value of df$a else if df$a is NA then simply take the value of df$b else do df$a-df$b

df$c=ifelse(is.na(df$b),df$a,ifelse(is.na(df$a),df$b,df$a-df$b))

Output:

df
   a  b  c
1  2  1  1
2  2 NA  2
3 NA  3  3
4 NA NA NA
Irreverence answered 2/9, 2018 at 14:43 Comment(0)
F
0

Here is one option with base R where we replace the NA elements with 0, Reduce it to a single vector by taking the rowwise difference and change the rows that have all NA elements to NA

df1$c <- abs(Reduce(`-`, replace(df1, is.na(df1), 0))) *
               NA^ (!rowSums(!is.na(df1)) )
df1$c
#[1]  1  2  3 NA

Or using similar method with data.table

library(data.table)
setDT(df1)[!is.na(a) | !is.na(b), c := abs(Reduce(`-`, 
               replace(.SD, is.na(.SD), 0)))]

data

df1 <- structure(list(a = c(2L, 2L, NA, NA), b = c(1L, NA, 3L, NA)), 
 row.names = c("1", "2", "3", "4"), class = "data.frame")
Fancher answered 2/9, 2018 at 15:17 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.