How to treat NAs like values when comparing elementwise in R

Asked 3/6, 2016 at 8:55 Answered 9/7, 2024 at 7:45

I want to compare two vectors elementwise to check whether an element in a certain position in the first vector is different from the element in the same position in the second vector.
The point is that I have NA values inside the vectors, and when doing the comparison for these values I get NA instead of TRUE or FALSE.

Reproducible example:

Here is what I get:

a<-c(1, NA, 2, 2, NA)
b<-c(1, 1, 1, NA, NA)
a!=b
[1] FALSE   TRUE   NA   NA   NA

Here is how I would like the != operator to work (treat NA values as if they were just another "level" of the variable):

a!=b
[1] FALSE   TRUE   TRUE   TRUE   FALSE

There's a possible solution at this link, but the guy is creating a function to perform the task. I was wondering if there's a more elegant way to do that.

Yehudi answered 3/6, 2016 at 8:55 Comment(7)

How do you get TRUE values for the second case. It should be FALSE as we are comparing NA with 1 – Sclerotomy 3/6, 2016 at 9:3

Could you use a dummy value instead of NA? e.g. a[is.na(a)] <- 999. – Paule 3/6, 2016 at 9:8

@Sclerotomy I get TRUE because NA is different (not equal) from 1. @Bazz yes, I thought of that solution and it works too, but I would like to have a more elegant solution without having to make the imputation as I should have to reconvert the values fo NA after the comparison (I have a very large dataset so it's not very practical) – Yehudi 3/6, 2016 at 9:13

Are you looking for new examples to update the post? – Sclerotomy 3/6, 2016 at 9:18

can you update your output/explain more clearly what you try to achieve? – Fruin 3/6, 2016 at 10:1

Which solution did you use in the end? – Paule 25/8, 2016 at 12:40

@Sclerotomy is right that the indicated results are not reproducible. a<-c(1, NA, 2, 2, NA); b<-c(1, 1, 1, NA, NA); a!=b; [1] FALSE NA TRUE NA NA – Judie 13/12, 2021 at 23:22

Taking advantage of the fact that:

T & NA = NA but F & NA = F

and

F | NA = NA but T | NA = T

The following solution works, with carefully placed brackets:

(a != b | (is.na(a) & !is.na(b)) | (is.na(b) & !is.na(a))) & !(is.na(a) & is.na(b))

You could define:

`%!=na%` <- function(e1, e2) (e1 != e2 | (is.na(e1) & !is.na(e2)) | (is.na(e2) & !is.na(e1))) & !(is.na(e1) & is.na(e2))

and then use:

a %!=na% b

Paule answered 3/6, 2016 at 9:18 Comment(2)

This works, but I'll leave the question open as you're performing a variation of the function in the link I put in the description. I posted the question to find a simpler solution (if it exists). – Yehudi 3/6, 2016 at 9:38

@Yehudi That's fine. You're right it does seem like an inelegant solution to something quite simple. – Paule 3/6, 2016 at 9:40

I like this one, since it is pretty simple and it's easy to see that it works (source):

# This function returns TRUE wherever elements are the same, including NA's,
# and FALSE everywhere else.
compareNA <- function(v1, v2) 
{
    same <- (v1 == v2) | (is.na(v1) & is.na(v2))
    same[is.na(same)] <- FALSE
    return(same)
}

Hedger answered 16/4, 2020 at 11:20 Comment(0)

Here is another solution. It's probably slower than my other answer because it's not vectorised, but it's certainly more elegant. I noticed the other day that %in% compares NA like other values. Thus c(1L, NA) %in% 1:4 gives TRUE FALSE rather than TRUE NA, for example.

So you can have:

!mapply(`%in%`, a, b)

Paule answered 15/6, 2016 at 8:10 Comment(0)

We could perform an on-the-fly replacement of the NA values with a value v1 which is not present in both the vectors and do the !=

f1 <- function(x, y) {
  v1 <- setdiff(1:1000, na.omit(unique(c(x,y))))[1]
  replace(x, is.na(x), v1) != replace(y, is.na(y), v1)
}

f1(a,b)
#[1] FALSE  TRUE  TRUE  TRUE FALSE
f1(a1,b1)
#[1] TRUE TRUE TRUE
f1(a2,b2)
#[1] FALSE  TRUE  TRUE FALSE

data

a <- c(1, NA, 2, 2, NA)
b<-c(1, 1, 1, NA, NA)
a1 <- c(NA, 1, NA)
b1 <- c(2, NA, 3) 
a2<-c(1,NA,2,NA)
b2<-c(1,1,3,NA)

Sclerotomy answered 3/6, 2016 at 8:55 Comment(19)

I forgot to add the case in which I compare two NA values: in that case, I want the comparison to return FALSE. Moreover, I would like to have a solution which works both if I have the NAs in the first vector that in the second. I edited the question. – Yehudi 3/6, 2016 at 9:1

This returns FALSE in the second position when it should be TRUE. – Paule 3/6, 2016 at 9:6

@Bazz Why should it be TRUE?? – Sclerotomy 3/6, 2016 at 9:6

Because a[2] is NA and b[2] is 1, so they're different. – Paule 3/6, 2016 at 9:7

@Bazz That is my point. They are different, so it should return FALSE – Sclerotomy 3/6, 2016 at 9:7

As @Bazz pointed out, the updated solution a!= b & !is.na(a) & !is.na(b) doesn't work as it returns FALSE for the second element: in fact, I'd like to treat NAs like if they were integers, returning TRUE for the comparison of the second element. – Yehudi 3/6, 2016 at 9:8

@Yehudi So, if the value in a <- c(NA, 1, NA); b <- c(2, NA, 3) what would be the result? – Sclerotomy 3/6, 2016 at 9:9

@Sclerotomy it should be TRUE TRUE TRUE – Yehudi 3/6, 2016 at 9:10

@helter Can you check now. – Sclerotomy 3/6, 2016 at 9:28

@Bazz Please remove your comments from my post as it is not relevant now – Sclerotomy 3/6, 2016 at 9:30

This would not work, as replace(a, is.na(a), FALSE) substitutes NA values with 0. This could be a problem in the case where a<-0; b<-NA, which would return FALSE instead of TRUE (0 is different from NA). – Yehudi 3/6, 2016 at 10:13

@Yehudi In that case you can replace it with some other value. I have showed 3 cases where it works – Sclerotomy 3/6, 2016 at 10:13

@Sclerotomy yes you're right, but first I should look for a value which is NOT included in both vectors to avoid mistakes (runningunique() on both vectors or something like that) – Yehudi 3/6, 2016 at 10:18

@Yehudi something like setdiff(1:1000, na.omit(unique(c(a, b))))[1] – Sclerotomy 3/6, 2016 at 10:22

Exactly. So the solution would still be to do a simple pairwise comparison but making an on-the-fly imputation of NA values, which only exists for the comparison. Could you implement the setdiff() part inside the function? This seems to me like quite a good way to solve the problem – Yehudi 3/6, 2016 at 10:25

@helter added that par – Sclerotomy 3/6, 2016 at 10:29

1:1000?? What kind of dirty hack is that? This will not work. – Hedger 16/4, 2020 at 11:14

@Sclerotomy so how will that work for vectors longer than 1000? – Hedger 16/4, 2020 at 19:46

I don't remember the context on which this was answered. It is close to ~ 4 years back. – Sclerotomy 16/4, 2020 at 19:49

I'm not sure about it being the most elegant, but

paste(a) != paste(b)

(convert all elements of both vectors to strings)

Has the desired output, and is simpler, than most of the answers.

Previdi answered 9/7, 2024 at 7:45 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

data

Recommended topics

Hot tags