Array comparison not matching elementwise comparison in numpy
Asked Answered
D

1

7

I have a numpy array arr. It's a numpy.ndarray, size is (5553110,), dtype=float32.

When I do:

(arr > np.pi )[3154950]
False
(arr[3154950] > np.pi )
True

Why is the first comparison getting it wrong? And how can I fix it?

The values:

arr[3154950]= 3.1415927
np.pi= 3.141592653589793

Is the problem with precision?

Doubleton answered 26/4, 2018 at 15:46 Comment(2)
I'd be curious to see what the values of the 3 or so indices on each side of the index is (and the value at the index).Marven
I've gone ahead and opened an issue in the numpy issues as well to see if we can get a clearer answer of what is happening in the backend of numpy. github.com/numpy/numpy/issues/10982Marven
A
6

The problem is due to accuracy of np.float32 vs np.float64.

Use np.float64 and you will not see a problem:

import numpy as np

arr = np.array([3.1415927], dtype=np.float64)

print((arr > np.pi)[0])  # True

print(arr[0] > np.pi)    # True

As @WarrenWeckesser comments:

It involves how numpy decides to cast the arguments of its operations. Apparently, with arr > scalar, the scalar is converted to the same type as the array arr, which in this case is np.float32. On the other hand, with something like arr > arr2, with both arguments nonscalar arrays, they will use a common data type. That's why (arr > np.array([np.pi]))[3154950] returns True.

Related github issue

Aeolotropic answered 26/4, 2018 at 15:54 Comment(9)
Do the two indexing schemes change how the conversion is applied?Marven
@GrantWilliams, That is my instinct. I believe conversion to np.float32 is applied inconsistently in the 2 scenarios, hence leading to differences. I think we need a better answer.. OP, don't accept this one yet!Aeolotropic
This is correct, thanks (but I shall wait)! I expected float 32 to be upgraded to float64 even for big arrays.Doubleton
This is the most interesting problem i've seen all day. I'm really curious why they would be different. When converting the entire array into a boolean array and then indexing im curious if its done with np.float32 for efficiency reasons? Where as using array[index] > value might not need to be converted to a lower precision because its a single comparison. Definitely going to have to get deep in the docs to figure it outMarven
It involves how numpy decides to cast the arguments of its operations. Apparently, with arr > scalar, the scalar is converted to the same type as the array arr, which in this case is np.float32. On the other hand, with something like arr > arr2, with both arguments nonscalar arrays, they will use a common data type. That's why (arr > np.array([np.pi]))[3154950] returns True.Secco
Indeed, I added some data points to ponder. But I'd appreciate if someone found the answer (i.e. numpy "rationale" / priority logic) and posted it.. my one just provides a fix, not an explanation.Aeolotropic
np.float64(3.1415927) > np.float32(np.pi) # False is consistent because once you round up the number increasing precision doesn't decrease back the valueDoubleton
@WarrenWeckesser Same as this issue (maybe non-issue) right?Slapstick
@Slapstick Yep, that looks like the same issue.Secco

© 2022 - 2024 — McMap. All rights reserved.