I'm trying to resize a 2D numpy array of a given factor, obtaining a smaller array in output.
The array is read from an image file and some of the values should be NaN (Not a Number, np.nan from numpy): it is the result of remote sensing measurements from satellite and simply some pixels weren't measured.
The suitable package I found for this is scypy.misc.imresize, but each pixel in the output array containing a NaN is set to NaN, even if there are some valid data in the original pixels interpolated together.
My solution is appended here, what I've done is essentially :
- create a new array based on the original array shape and the desired reduction factor
- create an index array to address all the pixels of the original array to be averaged for each pixel in the new
- cycle through the new array pixels and average all the not-NaN pixel to obtain the new array pixel value; it there are only NaN, the output will be NaN.
I'm planning to add keyword to choice between different output (average, median, standard deviation of the input pixels and so on).
It is working as expected, but on a ~1Mpx image it takes around 3 seconds. Due to my lack of experience in python I'm searching for improvements.
Do anyone have suggestion how to do it better and more efficiently?
Do anyone know a library that already implements all that stuff?
Thanks.
Here you have an example output for random pixel input generated with the code here below:
import numpy as np
import pylab as plt
from scipy import misc
def resize_2d_nonan(array,factor):
"""
Resize a 2D array by different factor on two axis sipping NaN values.
If a new pixel contains only NaN, it will be set to NaN
Parameters
----------
array : 2D np array
factor : int or tuple. If int x and y factor wil be the same
Returns
-------
array : 2D np array scaled by factor
Created on Mon Jan 27 15:21:25 2014
@author: damo_ma
"""
xsize, ysize = array.shape
if isinstance(factor,int):
factor_x = factor
factor_y = factor
elif isinstance(factor,tuple):
factor_x , factor_y = factor[0], factor[1]
else:
raise NameError('Factor must be a tuple (x,y) or an integer')
if not (xsize %factor_x == 0 or ysize % factor_y == 0) :
raise NameError('Factors must be intger multiple of array shape')
new_xsize, new_ysize = xsize/factor_x, ysize/factor_y
new_array = np.empty([new_xsize, new_ysize])
new_array[:] = np.nan # this saves us an assignment in the loop below
# submatrix indexes : is the average box on the original matrix
subrow, subcol = np.indices((factor_x, factor_y))
# new matrix indexs
row, col = np.indices((new_xsize, new_ysize))
# some output for testing
#for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
# print '----------------------------------------------'
# print 'i: %i, j: %i, ind: %i ' % (i, j, ind)
# print 'subrow+i*new_ysize, subcol+j*new_xsize :'
# print i,'*',new_xsize,'=',i*factor_x
# print j,'*',new_ysize,'=',j*factor_y
# print subrow+i*factor_x,subcol+j*factor_y
# print '---'
# print 'array[subrow+i*factor_x,subcol+j*factor_y] : '
# print array[subrow+i*factor_x,subcol+j*factor_y]
for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
# define the small sub_matrix as view of input matrix subset
sub_matrix = array[subrow+i*factor_x,subcol+j*factor_y]
# modified from any(a) and all(a) to a.any() and a.all()
# see https://mcmap.net/q/86425/-valueerror-the-truth-value-of-an-array-with-more-than-one-element-is-ambiguous-use-a-any-or-a-all
if not (np.isnan(sub_matrix)).all(): # if we haven't all NaN
if (np.isnan(sub_matrix)).any(): # if we haven no NaN at all
msub_matrix = np.ma.masked_array(sub_matrix,np.isnan(sub_matrix))
(new_array.reshape(-1))[ind] = np.mean(msub_matrix)
else: # if we haven some NaN
(new_array.reshape(-1))[ind] = np.mean(sub_matrix)
# the case assign NaN if we have all NaN is missing due
# to the standard values of new_array
return new_array
row , cols = 6, 4
a = 10*np.random.random_sample((row , cols))
a[0:3,0:2] = np.nan
a[0,2] = np.nan
factor_x = 2
factor_y = 2
a_misc = misc.imresize(a, .5, interp='nearest', mode='F')
a_2d_nonan = resize_2d_nonan(a,(factor_x,factor_y))
print a
print
print a_misc
print
print a_2d_nonan
plt.subplot(131)
plt.imshow(a,interpolation='nearest')
plt.title('original')
plt.xticks(arange(a.shape[1]))
plt.yticks(arange(a.shape[0]))
plt.subplot(132)
plt.imshow(a_misc,interpolation='nearest')
plt.title('scipy.misc')
plt.xticks(arange(a_misc.shape[1]))
plt.yticks(arange(a_misc.shape[0]))
plt.subplot(133)
plt.imshow(a_2d_nonan,interpolation='nearest')
plt.title('my.func')
plt.xticks(arange(a_2d_nonan.shape[1]))
plt.yticks(arange(a_2d_nonan.shape[0]))
EDIT
I add some modification to address ChrisProsser comment.
If I substitute the NaN with some other value, let say the average of the not-NaN pixels, it will affect all the subsequent calculation: the difference between the resampled original array and the resampled array with NaN substituted shows that 2 pixels changed their values.
My goal is simply skip all the NaN pixels.
# substitute NaN with the average value
ind_nonan , ind_nan = np.where(np.isnan(a) == False), np.where(np.isnan(a) == True)
a_substitute = np.copy(a)
a_substitute[ind_nan] = np.mean(a_substitute[ind_nonan]) # substitute the NaN with average on the not-Nan
a_substitute_misc = misc.imresize(a_substitute, .5, interp='nearest', mode='F')
a_substitute_2d_nonan = resize_2d_nonan(a_substitute,(factor_x,factor_y))
print a_2d_nonan-a_substitute_2d_nonan
[[ nan -0.02296697]
[ 0.23143208 0. ]
[ 0. 0. ]]
** 2nd EDIT**
To address the Hooked's answer I put some additional code. It is an iteresting idea, sadly it interpolates new values over pixels that should be "empty" (NaN) and for my small example generate more NaN than good values.
X , Y = np.indices((row , cols))
X_new , Y_new = np.indices((row/factor_x , cols/factor_y))
from scipy.interpolate import CloughTocher2DInterpolator as intp
C = intp((X[ind_nonan],Y[ind_nonan]),a[ind_nonan])
a_interp = C(X_new , Y_new)
print a
print
print a_interp
[[ nan, nan],
[ nan, nan],
[ nan, 6.32826577]])