fast numpy addnan
Asked Answered
S

5

9

I would like to add thousands of 4D arrays element wise and accounting for nans. A simple example using 1D arrays would be:

X = array([4,7,89,nan,89,65, nan])
Y = array([0,5,4, 9,  8, 100,nan])
z = X+Y
print z = array([4,12,93,9,97,165,nan])

I've written a simple for loop around this but it takes forever - not a smart solution. Another solution could be creating a larger array and use bottleneck nansum but this would take too much memory for my laptop. I need a running sum over 11000 cases.

Does anyone have a smart and fast way to do this?

Secrete answered 23/8, 2012 at 17:1 Comment(0)
C
10

Here is one possibility:

>>> x = np.array([1, 2, np.nan, 3, np.nan, 4])
... y = np.array([1, np.nan, 2, 5, np.nan, 8])
>>> x = np.ma.masked_array(np.nan_to_num(x), mask=np.isnan(x) & np.isnan(y))
>>> y = np.ma.masked_array(np.nan_to_num(y), mask=x.mask)
>>> (x+y).filled(np.nan)
array([  2.,   2.,   2.,   8.,  nan,  12.])

The real difficulty is that you seem to want nan to be interpreted as zero unless all values at a particular position are nan. This means that you must look at both x and y to determine which nans to replace. If you are okay with having all nan values replaced, then you can simply do np.nan_to_num(x) + np.nan_to_num(y).

Cocoon answered 23/8, 2012 at 17:20 Comment(5)
Masked arrays are the way to go here if your numpy implementation is new enough to support it (mine isn't -- maybe it's time for an upgrade) (+1).Cognizable
@mgilson: Heh, probably is time! I think masked arrays have been in numpy for a few years now.Cocoon
Well my computer's a few years old ;^)Cognizable
The masked array method would mean a "for" loop and be much slower for my problem.The generator expression works frighteningly fast and the results are accurate.Secrete
@Cognizable masked arrays were always part of numpy as objects of their own. Since version 1.2, they became a subclass of standard ndarrays. @Secrete Where do you see a for loop ? @Cocoon The use of np.nan_to_num is overkill, have a look on the solution I posted below (that I would very humbly call the "way to go" if you can use masked arrays...)Reverse
K
3

You could do something like:

arr1 = np.array([1.0, 1.0, np.nan, 1.0, 1.0, np.nan])
arr2 = np.array([1.0, 1.0, 1.0, 1.0, 1.0, np.nan])
flags = np.isnan(arr1) & np.isnan(arr2)
copy1 = arr1.copy()
copy2 = arr2.copy()
copy1[np.isnan(copy1)] = 0.0
copy2[np.isnan(copy2)] = 0.0
out = copy1 + copy2
out[flags] = np.NaN
print out
array([  2.,   2.,   1.,   2.,   2.,  NaN])

to find the locations in the arrays where both have a NaN in that index. Then, do essentially what @mgilson suggested, as in make copies and replace the NaNs with 0.0, add the two arrays together, and then replace the flagged indices above with np.NaN.

Kalasky answered 23/8, 2012 at 17:23 Comment(4)
@mgilson: I'm trying to write a generator expression as it consumes less memory but I'm a bit confused as to how this works when dealing with very large numbers and reading a netcdf file, slice for slice: for i in cases: array = np.array(netcdfvar[i]) # Then sum these slices accounting for nan not sure how this generator would look.Secrete
@Secrete -- I think you posted this on the wrong answer ;-). Anyway, I'm not familiar with reading slices from a netcdf file, but, you might try the following: sum( nan_to_zero(np.array(netcdfvar[i])) for i in cases ), or as BrenBarn points out: sum( np.nan_to_num(netcdfvar[i]) for i in cases )Cognizable
@mgilson: yes, you are right. I'm still learning how to use this site. Thanks. I've been trying several variations with varying success. Your solution is a bit counter intuitive. I'll test it.Secrete
@mgilson: works like a charm and super fast. Thanks for teaching me this (and the generator expression). Much obliged.Secrete
N
3
import numpy as np
z=np.nansum([X,Y],axis=0)
Nich answered 24/9, 2013 at 12:29 Comment(1)
This almost works. The issue is that this solution does not produce the desired output. The output should include NaNs where both input vectors have NaNs in the same positions. We can put the NaNs back with the addition of a third line to this solution: z[np.isnan(x) & np.isnan(y)] = np.NaNSteady
C
1

Not sure how this would perform, but it's worth a shot :)

def nan_to_zero(array):
    new_arr = array.copy()
    new_arr[np.isnan(array)] = 0.
    return new_arr

sum( nan_to_zero(arr) for arr in array_generator )

This doesn't result in a NaN in the last place of your array though. It results in a 0 ...

Cognizable answered 23/8, 2012 at 17:12 Comment(8)
@mgilson: a list comprehension after removing the nans. I never thought about the list comprehension part. But I suspect this assumes a 1D array. Can't see how I could code this method for a 4D array.Secrete
@Secrete -- It's actually a generator expression, but works similarly. I don't see any reason why this couldn't be used with 4D arrays though. Really, 4D arrays are just 1D arrays in memory anyway (Unless you really have view objects, but it should still work with those as well)Cognizable
@Shejo284, sum just calls __add__, however that's defined. Since __add__ is defined for 4D arrays as for 1D arrays, it works.Davina
@Cocoon -- You're correct (I didn't know that existed, neat). And, looking at the source, it looks like it does almost exactly what I have coded up above (except that it needs to do a whole lot more type-checking).Cognizable
@mgilson: giving this try and will come back with the results soon.Secrete
@Davina -- Thanks for explaining that for me. I think you did a better job than I would have done.Cognizable
I'm always happy to improve my answers if you leave a comment saying how this is either incorrect, or could be better. Thanks!Cognizable
@BrenBarn: works like a charm and super fast. Thanks for your great contribution. Much obliged.Secrete
R
1

I see several simpler solutions:

  • (EDITED) Using np.ma

    mX = np.ma.masked_array(X, mask=np.isnan(X))
    mY = np.ma.masked_array(Y, mask=np.isnan(Y))
    mZ = np.ma.masked_array(mX.filled(0) + mY.filled(0),
                            mask=mX.mask * mY.mask)
    Z = mZ.filled(np.nan)
    
  • (EDITED) Not using np.ma

    mx = np.isnan(x)
    my = np.isnan(y)
    z = np.where(mx,0,x) + np.where(my,0,y)
    z[mx&my] = np.nan
    
Reverse answered 24/8, 2012 at 20:42 Comment(3)
These solutions do not produce the desired output. He wants the non-nan terms to be added, with nan appearing in the result only if all values at a particular position are nan. Your solutions produce additional nans at positions where only one of the two input vectors has a nan.Cocoon
OK, fixed. Thanks for keeping me on my toesReverse
Also note that your last solution is something the OP explicitly said he didn't want to do (create a larger array containing both). The second solution looks nice, though.Cocoon

© 2022 - 2024 — McMap. All rights reserved.