Array comparison not matching elementwise comparison in numpy

About

Asked 26/4, 2018 at 15:46 Answered 26/4, 2018 at 15:54

Solved python arrays numpy floating-point floating-accuracy

I have a numpy array arr. It's a numpy.ndarray, size is (5553110,), dtype=float32.

When I do:

(arr > np.pi )[3154950]
False
(arr[3154950] > np.pi )
True

Why is the first comparison getting it wrong? And how can I fix it?

The values:

arr[3154950]= 3.1415927
np.pi= 3.141592653589793

Is the problem with precision?

Doubleton answered 26/4, 2018 at 15:46 Comment(2)

I'd be curious to see what the values of the 3 or so indices on each side of the index is (and the value at the index). – Marven 26/4, 2018 at 15:52

I've gone ahead and opened an issue in the numpy issues as well to see if we can get a clearer answer of what is happening in the backend of numpy. github.com/numpy/numpy/issues/10982 – Marven 26/4, 2018 at 16:3

The problem is due to accuracy of np.float32 vs np.float64.

Use np.float64 and you will not see a problem:

import numpy as np

arr = np.array([3.1415927], dtype=np.float64)

print((arr > np.pi)[0])  # True

print(arr[0] > np.pi)    # True

As @WarrenWeckesser comments:

It involves how numpy decides to cast the arguments of its operations. Apparently, with arr > scalar, the scalar is converted to the same type as the array arr, which in this case is np.float32. On the other hand, with something like arr > arr2, with both arguments nonscalar arrays, they will use a common data type. That's why (arr > np.array([np.pi]))[3154950] returns True.

Related github issue

Aeolotropic answered 26/4, 2018 at 15:54 Comment(9)

Do the two indexing schemes change how the conversion is applied? – Marven 26/4, 2018 at 15:55

@GrantWilliams, That is my instinct. I believe conversion to np.float32 is applied inconsistently in the 2 scenarios, hence leading to differences. I think we need a better answer.. OP, don't accept this one yet! – Aeolotropic 26/4, 2018 at 15:57

This is correct, thanks (but I shall wait)! I expected float 32 to be upgraded to float64 even for big arrays. – Doubleton 26/4, 2018 at 15:58

This is the most interesting problem i've seen all day. I'm really curious why they would be different. When converting the entire array into a boolean array and then indexing im curious if its done with np.float32 for efficiency reasons? Where as using array[index] > value might not need to be converted to a lower precision because its a single comparison. Definitely going to have to get deep in the docs to figure it out – Marven 26/4, 2018 at 15:59

It involves how numpy decides to cast the arguments of its operations. Apparently, with arr > scalar, the scalar is converted to the same type as the array arr, which in this case is np.float32. On the other hand, with something like arr > arr2, with both arguments nonscalar arrays, they will use a common data type. That's why (arr > np.array([np.pi]))[3154950] returns True. – Secco 26/4, 2018 at 16:1

Indeed, I added some data points to ponder. But I'd appreciate if someone found the answer (i.e. numpy "rationale" / priority logic) and posted it.. my one just provides a fix, not an explanation. – Aeolotropic 26/4, 2018 at 16:4

np.float64(3.1415927) > np.float32(np.pi) # False is consistent because once you round up the number increasing precision doesn't decrease back the value – Doubleton 26/4, 2018 at 16:10

@WarrenWeckesser Same as this issue (maybe non-issue) right? – Slapstick 26/4, 2018 at 16:14

@Slapstick Yep, that looks like the same issue. – Secco 26/4, 2018 at 16:19

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags