How to treat NAs like values when comparing elementwise in R
Asked Answered
Y

5

13

I want to compare two vectors elementwise to check whether an element in a certain position in the first vector is different from the element in the same position in the second vector.
The point is that I have NA values inside the vectors, and when doing the comparison for these values I get NA instead of TRUE or FALSE.

Reproducible example:

Here is what I get:

a<-c(1, NA, 2, 2, NA)
b<-c(1, 1, 1, NA, NA)
a!=b
[1] FALSE   TRUE   NA   NA   NA  

Here is how I would like the != operator to work (treat NA values as if they were just another "level" of the variable):

a!=b
[1] FALSE   TRUE   TRUE   TRUE   FALSE

There's a possible solution at this link, but the guy is creating a function to perform the task. I was wondering if there's a more elegant way to do that.

Yehudi answered 3/6, 2016 at 8:55 Comment(7)
How do you get TRUE values for the second case. It should be FALSE as we are comparing NA with 1Sclerotomy
Could you use a dummy value instead of NA? e.g. a[is.na(a)] <- 999.Paule
@Sclerotomy I get TRUE because NA is different (not equal) from 1. @Bazz yes, I thought of that solution and it works too, but I would like to have a more elegant solution without having to make the imputation as I should have to reconvert the values fo NA after the comparison (I have a very large dataset so it's not very practical)Yehudi
Are you looking for new examples to update the post?Sclerotomy
can you update your output/explain more clearly what you try to achieve?Fruin
Which solution did you use in the end?Paule
@Sclerotomy is right that the indicated results are not reproducible. a<-c(1, NA, 2, 2, NA); b<-c(1, 1, 1, NA, NA); a!=b; [1] FALSE NA TRUE NA NA Judie
P
14

Taking advantage of the fact that:

T & NA = NA but F & NA = F

and

F | NA = NA but T | NA = T

The following solution works, with carefully placed brackets:

(a != b | (is.na(a) & !is.na(b)) | (is.na(b) & !is.na(a))) & !(is.na(a) & is.na(b))

You could define:

`%!=na%` <- function(e1, e2) (e1 != e2 | (is.na(e1) & !is.na(e2)) | (is.na(e2) & !is.na(e1))) & !(is.na(e1) & is.na(e2))

and then use:

a %!=na% b
Paule answered 3/6, 2016 at 9:18 Comment(2)
This works, but I'll leave the question open as you're performing a variation of the function in the link I put in the description. I posted the question to find a simpler solution (if it exists).Yehudi
@Yehudi That's fine. You're right it does seem like an inelegant solution to something quite simple.Paule
H
7

I like this one, since it is pretty simple and it's easy to see that it works (source):

# This function returns TRUE wherever elements are the same, including NA's,
# and FALSE everywhere else.
compareNA <- function(v1, v2) 
{
    same <- (v1 == v2) | (is.na(v1) & is.na(v2))
    same[is.na(same)] <- FALSE
    return(same)
}
Hedger answered 16/4, 2020 at 11:20 Comment(0)
P
4

Here is another solution. It's probably slower than my other answer because it's not vectorised, but it's certainly more elegant. I noticed the other day that %in% compares NA like other values. Thus c(1L, NA) %in% 1:4 gives TRUE FALSE rather than TRUE NA, for example.

So you can have:

!mapply(`%in%`, a, b)
Paule answered 15/6, 2016 at 8:10 Comment(0)
S
1

We could perform an on-the-fly replacement of the NA values with a value v1 which is not present in both the vectors and do the !=

f1 <- function(x, y) {
  v1 <- setdiff(1:1000, na.omit(unique(c(x,y))))[1]
  replace(x, is.na(x), v1) != replace(y, is.na(y), v1)
}

f1(a,b)
#[1] FALSE  TRUE  TRUE  TRUE FALSE
f1(a1,b1)
#[1] TRUE TRUE TRUE
f1(a2,b2)
#[1] FALSE  TRUE  TRUE FALSE

data

a <- c(1, NA, 2, 2, NA)
b<-c(1, 1, 1, NA, NA)
a1 <- c(NA, 1, NA)
b1 <- c(2, NA, 3) 
a2<-c(1,NA,2,NA)
b2<-c(1,1,3,NA)
Sclerotomy answered 3/6, 2016 at 8:55 Comment(19)
I forgot to add the case in which I compare two NA values: in that case, I want the comparison to return FALSE. Moreover, I would like to have a solution which works both if I have the NAs in the first vector that in the second. I edited the question.Yehudi
This returns FALSE in the second position when it should be TRUE.Paule
@Bazz Why should it be TRUE??Sclerotomy
Because a[2] is NA and b[2] is 1, so they're different.Paule
@Bazz That is my point. They are different, so it should return FALSESclerotomy
As @Bazz pointed out, the updated solution a!= b & !is.na(a) & !is.na(b) doesn't work as it returns FALSE for the second element: in fact, I'd like to treat NAs like if they were integers, returning TRUE for the comparison of the second element.Yehudi
@Yehudi So, if the value in a <- c(NA, 1, NA); b <- c(2, NA, 3) what would be the result?Sclerotomy
@Sclerotomy it should be TRUE TRUE TRUEYehudi
@helter Can you check now.Sclerotomy
@Bazz Please remove your comments from my post as it is not relevant nowSclerotomy
This would not work, as replace(a, is.na(a), FALSE) substitutes NA values with 0. This could be a problem in the case where a<-0; b<-NA, which would return FALSE instead of TRUE (0 is different from NA).Yehudi
@Yehudi In that case you can replace it with some other value. I have showed 3 cases where it worksSclerotomy
@Sclerotomy yes you're right, but first I should look for a value which is NOT included in both vectors to avoid mistakes (runningunique() on both vectors or something like that)Yehudi
@Yehudi something like setdiff(1:1000, na.omit(unique(c(a, b))))[1]Sclerotomy
Exactly. So the solution would still be to do a simple pairwise comparison but making an on-the-fly imputation of NA values, which only exists for the comparison. Could you implement the setdiff() part inside the function? This seems to me like quite a good way to solve the problemYehudi
@helter added that parSclerotomy
1:1000?? What kind of dirty hack is that? This will not work.Hedger
@Sclerotomy so how will that work for vectors longer than 1000?Hedger
I don't remember the context on which this was answered. It is close to ~ 4 years back.Sclerotomy
P
1

I'm not sure about it being the most elegant, but

paste(a) != paste(b)

(convert all elements of both vectors to strings)

Has the desired output, and is simpler, than most of the answers.

Previdi answered 9/7, 2024 at 7:45 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.