fast numpy addnan

Asked 23/8, 2012 at 17:1 Answered 24/9, 2013 at 12:29

Solved python optimization numpy nan scientific-computing

I would like to add thousands of 4D arrays element wise and accounting for nans. A simple example using 1D arrays would be:

X = array([4,7,89,nan,89,65, nan])
Y = array([0,5,4, 9,  8, 100,nan])
z = X+Y
print z = array([4,12,93,9,97,165,nan])

I've written a simple for loop around this but it takes forever - not a smart solution. Another solution could be creating a larger array and use bottleneck nansum but this would take too much memory for my laptop. I need a running sum over 11000 cases.

Does anyone have a smart and fast way to do this?

Secrete answered 23/8, 2012 at 17:1 Comment(0)

Here is one possibility:

>>> x = np.array([1, 2, np.nan, 3, np.nan, 4])
... y = np.array([1, np.nan, 2, 5, np.nan, 8])
>>> x = np.ma.masked_array(np.nan_to_num(x), mask=np.isnan(x) & np.isnan(y))
>>> y = np.ma.masked_array(np.nan_to_num(y), mask=x.mask)
>>> (x+y).filled(np.nan)
array([  2.,   2.,   2.,   8.,  nan,  12.])

The real difficulty is that you seem to want nan to be interpreted as zero unless all values at a particular position are nan. This means that you must look at both x and y to determine which nans to replace. If you are okay with having all nan values replaced, then you can simply do np.nan_to_num(x) + np.nan_to_num(y).

Cocoon answered 23/8, 2012 at 17:20 Comment(5)

Masked arrays are the way to go here if your numpy implementation is new enough to support it (mine isn't -- maybe it's time for an upgrade) (+1). – Cognizable 23/8, 2012 at 17:25

@mgilson: Heh, probably is time! I think masked arrays have been in numpy for a few years now. – Cocoon 23/8, 2012 at 17:29

Well my computer's a few years old ;^) – Cognizable 23/8, 2012 at 17:35

The masked array method would mean a "for" loop and be much slower for my problem.The generator expression works frighteningly fast and the results are accurate. – Secrete 23/8, 2012 at 19:31

@Cognizable masked arrays were always part of numpy as objects of their own. Since version 1.2, they became a subclass of standard ndarrays. @Secrete Where do you see a for loop ? @Cocoon The use of np.nan_to_num is overkill, have a look on the solution I posted below (that I would very humbly call the "way to go" if you can use masked arrays...) – Reverse 24/8, 2012 at 20:47

You could do something like:

arr1 = np.array([1.0, 1.0, np.nan, 1.0, 1.0, np.nan])
arr2 = np.array([1.0, 1.0, 1.0, 1.0, 1.0, np.nan])
flags = np.isnan(arr1) & np.isnan(arr2)
copy1 = arr1.copy()
copy2 = arr2.copy()
copy1[np.isnan(copy1)] = 0.0
copy2[np.isnan(copy2)] = 0.0
out = copy1 + copy2
out[flags] = np.NaN
print out
array([  2.,   2.,   1.,   2.,   2.,  NaN])

to find the locations in the arrays where both have a NaN in that index. Then, do essentially what @mgilson suggested, as in make copies and replace the NaNs with 0.0, add the two arrays together, and then replace the flagged indices above with np.NaN.

Kalasky answered 23/8, 2012 at 17:23 Comment(4)

@mgilson: I'm trying to write a generator expression as it consumes less memory but I'm a bit confused as to how this works when dealing with very large numbers and reading a netcdf file, slice for slice: for i in cases: array = np.array(netcdfvar[i]) # Then sum these slices accounting for nan not sure how this generator would look. – Secrete 23/8, 2012 at 18:32

@Secrete -- I think you posted this on the wrong answer ;-). Anyway, I'm not familiar with reading slices from a netcdf file, but, you might try the following: sum( nan_to_zero(np.array(netcdfvar[i])) for i in cases ), or as BrenBarn points out: sum( np.nan_to_num(netcdfvar[i]) for i in cases ) – Cognizable 23/8, 2012 at 18:45

@mgilson: yes, you are right. I'm still learning how to use this site. Thanks. I've been trying several variations with varying success. Your solution is a bit counter intuitive. I'll test it. – Secrete 23/8, 2012 at 19:1

@mgilson: works like a charm and super fast. Thanks for teaching me this (and the generator expression). Much obliged. – Secrete 23/8, 2012 at 19:14

import numpy as np
z=np.nansum([X,Y],axis=0)

Nich answered 24/9, 2013 at 12:29 Comment(1)

This almost works. The issue is that this solution does not produce the desired output. The output should include NaNs where both input vectors have NaNs in the same positions. We can put the NaNs back with the addition of a third line to this solution: z[np.isnan(x) & np.isnan(y)] = np.NaN – Steady 18/12, 2014 at 11:3

Not sure how this would perform, but it's worth a shot :)

def nan_to_zero(array):
    new_arr = array.copy()
    new_arr[np.isnan(array)] = 0.
    return new_arr

sum( nan_to_zero(arr) for arr in array_generator )

This doesn't result in a NaN in the last place of your array though. It results in a 0 ...

Cognizable answered 23/8, 2012 at 17:12 Comment(8)

@mgilson: a list comprehension after removing the nans. I never thought about the list comprehension part. But I suspect this assumes a 1D array. Can't see how I could code this method for a 4D array. – Secrete 23/8, 2012 at 17:16

@Secrete -- It's actually a generator expression, but works similarly. I don't see any reason why this couldn't be used with 4D arrays though. Really, 4D arrays are just 1D arrays in memory anyway (Unless you really have view objects, but it should still work with those as well) – Cognizable 23/8, 2012 at 17:20

@Shejo284, sum just calls __add__, however that's defined. Since __add__ is defined for 4D arrays as for 1D arrays, it works. – Davina 23/8, 2012 at 17:24

@Cocoon -- You're correct (I didn't know that existed, neat). And, looking at the source, it looks like it does almost exactly what I have coded up above (except that it needs to do a whole lot more type-checking). – Cognizable 23/8, 2012 at 17:24

@mgilson: giving this try and will come back with the results soon. – Secrete 23/8, 2012 at 17:27

@Davina -- Thanks for explaining that for me. I think you did a better job than I would have done. – Cognizable 23/8, 2012 at 17:27

I'm always happy to improve my answers if you leave a comment saying how this is either incorrect, or could be better. Thanks! – Cognizable 23/8, 2012 at 18:42

@BrenBarn: works like a charm and super fast. Thanks for your great contribution. Much obliged. – Secrete 23/8, 2012 at 19:16

I see several simpler solutions:

(EDITED) Using np.ma

mX = np.ma.masked_array(X, mask=np.isnan(X))
mY = np.ma.masked_array(Y, mask=np.isnan(Y))
mZ = np.ma.masked_array(mX.filled(0) + mY.filled(0),
                        mask=mX.mask * mY.mask)
Z = mZ.filled(np.nan)

(EDITED) Not using np.ma

mx = np.isnan(x)
my = np.isnan(y)
z = np.where(mx,0,x) + np.where(my,0,y)
z[mx&my] = np.nan

Reverse answered 24/8, 2012 at 20:42 Comment(3)

These solutions do not produce the desired output. He wants the non-nan terms to be added, with nan appearing in the result only if all values at a particular position are nan. Your solutions produce additional nans at positions where only one of the two input vectors has a nan. – Cocoon 24/8, 2012 at 20:55

OK, fixed. Thanks for keeping me on my toes – Reverse 24/8, 2012 at 21:26

Also note that your last solution is something the OP explicitly said he didn't want to do (create a larger array containing both). The second solution looks nice, though. – Cocoon 24/8, 2012 at 21:31

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags