Numpy dtypes are so strict. So it doesnt produce an array like np.array([False, True, np.nan])
, it returns array([ 0., 1., nan])
which a float
array.
If you try to change a bool array like:
x= np.array([False, False, False])
x[0] = 5
will retrun array([ True, False, False])
... wow
But I think 5>np.nan
cannot be False
, it should be nan
, False
would mean that a data comparison has been made and it returned the result like 3>5
, which I think it's a disaster. Numpy produces data that we actually don't have. If it could have returned nan
then we could handle it with ease.
So I tried to modify the behavior with a function.
def ngrater(x, y):
with np.errstate(invalid='ignore'):
c=x>y
c=c.astype(np.object)
c[np.isnan(x)] = np.nan
c[np.isnan(y)] = np.nan
return c
a = np.array([np.nan,1,2,3,4,5, np.nan, np.nan, np.nan]) #9 elements
b = np.array([0,1,-2,-3,-4,-5, -5, -5, -5]) #9 elements
ngrater(a,b)
returns:
array([nan, False, True, True, True, True, nan, nan, nan], dtype=object)
But I think whole memory structure is changed in that way. Instead of getting a memory-block with uniform unites, it will produce a block of pointers, where the real data is somewhere else. So function may perform slower and probably that's why Numpy doesn't do that. We need a superBool
dtype which will contain also np.nan
, or we just have to use float arrays +1:True, -1:False, nan:nan