Comparing numpy arrays containing NaN

E

10

102

For my unittest, I want to check if two arrays are identical. Reduced example:

a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])

if np.all(a==b):
    print 'arrays are equal'

This does not work because nan != nan. What is the best way to proceed?

Erelia answered 22/5, 2012 at 21:18 Comment(0)

S

50

Alternatively you can use numpy.testing.assert_equal or numpy.testing.assert_array_equal with a try/except:

In : import numpy as np

In : def nan_equal(a,b):
...:     try:
...:         np.testing.assert_equal(a,b)
...:     except AssertionError:
...:         return False
...:     return True

In : a=np.array([1, 2, np.NaN])

In : b=np.array([1, 2, np.NaN])

In : nan_equal(a,b)
Out: True

In : a=np.array([1, 2, np.NaN])

In : b=np.array([3, 2, np.NaN])

In : nan_equal(a,b)
Out: False

Edit

Since you are using this for unittesting, bare assert (instead of wrapping it to get True/False) might be more natural.

Sitnik answered 22/5, 2012 at 21:42 Comment(5)

Excellent, this is the most elegant and built-in solution. I just added np.testing.assert_equal(a,b) in my unittest, and if it raises the exception, the test fails (no error), and I even get a nice print with the differences and the mismatch. Thanks. – Erelia 23/5, 2012 at 22:42

Please note that this solution works because numpy.testing.assert_* do not follow the same semantics of python assert's. In plain Python AssertionError exceptions are raised iff __debug__ is True i.e. if the script is run un-optimized (no -O flag), see the docs. For this reason I would strongly discourage wrapping AssertionErrors for flow control. Of course, since we are in a test suite the best solution is to leave the numpy.testing.assert alone. – Wendelina 14/6, 2013 at 10:14

The documentation of numpy.testing.assert_equal() does not explicitly indicates that it considers that NaN equals NaN (whereas numpy.testing.assert_array_equal() does): it this documented somewhere else? – Rubinrubina 8/8, 2018 at 13:57

@EricOLebigot Does numpy.testing.assert_equal() rely consider nan = nan? I'm getting an AssertionError: Arrays are not equal even if the arrays are identical including the dtype. – Corduroys 7/7, 2020 at 9:15

Both the current official documentation and the examples above show that it does consider that NaN == NaN. I am thinking that the best is for you to ask a new StackOverflow question with the details. – Rubinrubina 21/7, 2020 at 14:43

N

65

For versions of numpy prior to 1.19, this is probably the best approach in situations that don't specifically involve unit tests:

>>> ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
True

However, modern versions provide the array_equal function with a new keyword argument, equal_nan, which fits the bill exactly.

This was first pointed out by flyingdutchman; see his answer below for details.

Naturalist answered 22/5, 2012 at 21:24 Comment(5)

+1 This solution seems to be a bit faster than the solution I posted with masked arrays, although if you were creating the mask for use in other parts of your code, the overhead from creating the mask would become less of a factor in the overall efficiency of the ma strategy. – Coeternity 22/5, 2012 at 21:34

Thanks. Your solution works indeed, but I prefer the built-in test in numpy as suggested by Avaris – Erelia 23/5, 2012 at 22:43

I really like the simplicity of this. Also, it seems a faster than @Avaris solution. Turning this into a lambdafunction, testing with Ipython's %timeit yields 23.7 µs vs 1.01 ms. – Bronchi 2/3, 2014 at 14:25

@NovicePhysicist, interesting timing! I wonder if it has to do with the use of exception handling. Did you test positive vs. negative results? The speed will probably vary significantly depending on whether the exception is thrown or not. – Naturalist 2/3, 2014 at 15:10

Nope, just did a simple test, with some broadcasting relevant to my problem at hand (compared 2D array with 1D vector – so I guess it was row-wise comparison). But I guess that one could pretty easyli do a lot of testing in the Ipython notebook. Also, I used a lambda function for your solution, but I think it should be a little bit faster, had I used a regular function (often seems to be the case). – Bronchi 2/3, 2014 at 16:46

S

50

Alternatively you can use numpy.testing.assert_equal or numpy.testing.assert_array_equal with a try/except:

In : import numpy as np

In : def nan_equal(a,b):
...:     try:
...:         np.testing.assert_equal(a,b)
...:     except AssertionError:
...:         return False
...:     return True

In : a=np.array([1, 2, np.NaN])

In : b=np.array([1, 2, np.NaN])

In : nan_equal(a,b)
Out: True

In : a=np.array([1, 2, np.NaN])

In : b=np.array([3, 2, np.NaN])

In : nan_equal(a,b)
Out: False

Edit

Since you are using this for unittesting, bare assert (instead of wrapping it to get True/False) might be more natural.

Sitnik answered 22/5, 2012 at 21:42 Comment(5)

Excellent, this is the most elegant and built-in solution. I just added np.testing.assert_equal(a,b) in my unittest, and if it raises the exception, the test fails (no error), and I even get a nice print with the differences and the mismatch. Thanks. – Erelia 23/5, 2012 at 22:42

Please note that this solution works because numpy.testing.assert_* do not follow the same semantics of python assert's. In plain Python AssertionError exceptions are raised iff __debug__ is True i.e. if the script is run un-optimized (no -O flag), see the docs. For this reason I would strongly discourage wrapping AssertionErrors for flow control. Of course, since we are in a test suite the best solution is to leave the numpy.testing.assert alone. – Wendelina 14/6, 2013 at 10:14

The documentation of numpy.testing.assert_equal() does not explicitly indicates that it considers that NaN equals NaN (whereas numpy.testing.assert_array_equal() does): it this documented somewhere else? – Rubinrubina 8/8, 2018 at 13:57

@EricOLebigot Does numpy.testing.assert_equal() rely consider nan = nan? I'm getting an AssertionError: Arrays are not equal even if the arrays are identical including the dtype. – Corduroys 7/7, 2020 at 9:15

Both the current official documentation and the examples above show that it does consider that NaN == NaN. I am thinking that the best is for you to ask a new StackOverflow question with the details. – Rubinrubina 21/7, 2020 at 14:43

M

50

The easiest way is use numpy.allclose() method, which allow to specify the behaviour when having nan values. Then your example will look like the following:

a = np.array([1, 2, np.nan])
b = np.array([1, 2, np.nan])

if np.allclose(a, b, equal_nan=True):
    print('arrays are equal')

Then arrays are equal will be printed.

You can find here the related documentation

Manufactory answered 14/8, 2017 at 13:5 Comment(4)

+1 because your solution doesn't reinvent the wheel. However, this only works with numbers-like items. Otherwise, you get the nasty

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

– Jeth 6/3, 2018 at 14:51

This is a great answer in many contexts! It's worth adding the caveat that this will return true even if the arrays aren't strictly equal. Much of the time it won't matter though. – Naturalist 10/12, 2018 at 23:41

+1, since this returns a bool instead of raising an AssertionError. I needed this for implementing an __eq__(...) of an class with an array attribute. – Undress 24/5, 2020 at 17:47

Just as a pointer to a later answer: https://mcmap.net/q/24495/-comparing-numpy-arrays-containing-nan. Add rtol=0, atol=0 to avoid the issue that it considers close values equal (as mentioned by @senderle). So: np.allclose(a, b, equal_nan=True, rtol=0, atol=0). – Prostration 7/1, 2021 at 15:12

M

18

The numpy function array_equal fits the question's requirements perfectly with the equal_nan parameter added in 1.19. The example would look as follows:

a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
assert np.array_equal(a, b, equal_nan=True)

But be aware of the problem that this won't work if an element is of dtype object. Not sure if this is a bug or not.

Magocsi answered 9/12, 2020 at 14:57 Comment(0)

C

10

You could use numpy masked arrays, mask the NaN values and then use numpy.ma.all or numpy.ma.allclose:

For example:

a=np.array([1, 2, np.NaN])
b=np.array([1, 2, np.NaN])
np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True

Coeternity answered 22/5, 2012 at 21:23 Comment(5)

thanks for making me aware of the use of masked arrays. I prefer the solution of Avaris however. – Erelia 23/5, 2012 at 22:43

You should use np.ma.masked_where(np.isnan(a), a) else you fail to compare infinite values. – Grenoble 24/9, 2014 at 2:33

I tested with a=np.array([1, 2, np.NaN]) and b=np.array([1, np.NaN, 2]) which are clearly not equal and np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) still returns True, so be aware of that if you use this method. – Trilogy 5/1, 2017 at 14:41

This method only tests whether the two arrays without the NaN values are the same, but does NOT test if NaNs occurred in the same places... Can be dangerous to use. – Fantasy 29/5, 2019 at 22:4

It can be dangerous to use, that is a valid point. However... this is the only solution that works for me out of all suggestions mentioned herein. This is a nice approach if you are looking to compare data that may be masked differently but otherwise contain generally identical information. – Shaer 21/11, 2022 at 17:13

M

8

Just to complete @Luis Albert Centeno’s answer, you may rather use:

np.allclose(a, b, rtol=0, atol=0, equal_nan=True)

rtol and atol control the tolerance of the equality test. In short, allclose() returns:

all(abs(a - b) <= atol + rtol * abs(b))

By default they are not set to 0, so the function could return True if your numbers are close but not exactly equal.

PS: "I want to check if two arrays are identical " >> Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)

You’d test identity via keyword is:

a is b

Milson answered 5/11, 2019 at 10:13 Comment(0)

T

7

When I used the above answer:

 ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()

It gave me some erros when evaluate list of strings.

This is more type generic:

def EQUAL(a,b):
    return ((a == b) | ((a != a) & (b != b)))

Teeny answered 10/10, 2016 at 12:7 Comment(0)

O

2

As of v1.19, numpy's array_equal function supports an equal_nan argument:

assert np.array_equal(a, b, equal_nan=True)

Overweary answered 24/3, 2021 at 22:58 Comment(1)

flyingdutchman already posted this. I just added the version number for completeness. (and fixed the version number in your answer btw) – Thyrsus 26/3, 2022 at 0:37

W

0

For me this worked fine:

a = numpy.array(float('nan'), 1, 2)
b = numpy.array(2, float('nan'), 2)
numpy.equal(a, b, where = 
    numpy.logical_not(numpy.logical_or(
        numpy.isnan(a), 
        numpy.isnan(b)
    ))
).all()

PS. Ignores comparison when there's a nan

Writer answered 2/5, 2021 at 22:47 Comment(0)

T

-1

If you do this for things like unit tests, so you don't care much about performance and "correct" behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:

a = np.array(['a', 'b', None])
b = np.array(['a', 'b', None])
assert list(a) == list(b)

Casting ndarrays to lists can sometimes be useful to get the behaviour you want in some test. (But don't use this in production code, or with larger arrays!)

Trevelyan answered 16/1, 2019 at 23:8 Comment(1)

This doesn't actually work for numerics. For example, try setting a and b to np.array([1, np.nan]). – Thyrsus 26/3, 2022 at 0:47

Recommended topics

Hot tags