Conditional numpy.cumsum?
Asked Answered
R

4

9

I'm very new to python and numpy, so sorry if I misuse some terminology.

I have converted a raster to a 2D numpy array in the hopes of doing calculations on it quickly and efficiently.

  • I need to get the cumulative sum across a numpy array such that, for each value, I generate the sum of all values that are less than or equal to that, and write that value to a new array. I need to cycle through the entire array that way.

  • I also need to scale the output between 1 and 100, but that seems
    more straightforward.

Attempt to clarify by example:

array([[ 4,  1  ,  3   ,  2] dtype=float32)

I would want the output values (just doing first row by hand) to read:

array([[ 10,  1  ,  6   ,  3], etc.

Any ideas on how to do that?

Thanks in advance!


The near finished script for anyone who's interested:

#Generate Cumulative Thresholds
#5/15/14

import os
import sys
import arcpy
import numpy as np

#Enable overwriting output data
arcpy.env.overwriteOutput=True

#Set working directory
os.chdir("E:/NSF Project/Salamander_Data/Continuous_Rasters/Canadian_GCM/2020/A2A/")

#Set geoprocessing variables
inRaster = "zero_eurycea_cirrigera_CA2A2020.tif"
des = arcpy.Describe(inRaster)
sr = des.SpatialReference
ext = des.Extent
ll = arcpy.Point(ext.XMin,ext.YMin)

#Convert GeoTIFF to numpy array
a = arcpy.RasterToNumPyArray(inRaster)

#Flatten for calculations
a.flatten()

#Find unique values, and record their indices to a separate object
a_unq, a_inv = np.unique(a, return_inverse=True)

#Count occurences of array indices
a_cnt = np.bincount(a_inv)

#Cumulatively sum the unique values multiplied by the number of
#occurences, arrange sums as initial array
b = np.cumsum(a_unq * a_cnt)[a_inv]

#Divide all values by 10 (reverses earlier multiplication done to
#facilitate accurate translation of ASCII scientific notation
#values < 1 to array)
b /= 10

#Rescale values between 1 and 100
maxval = np.amax(b)
b /= maxval
b *= 100

#Restore flattened array to shape of initial array
c = b.reshape(a.shape)

#Convert the array back to raster format
outRaster = arcpy.NumPyArrayToRaster(c,ll,des.meanCellWidth,des.meanCellHeight)

#Set output projection to match input
arcpy.DefineProjection_management(outRaster, sr)

#Save the raster as a TIFF
outRaster.save("C:/Users/mkcarte2/Desktop/TestData/outRaster.tif")

sys.exit()
Renz answered 14/5, 2014 at 18:47 Comment(9)
Could you give an example on e.g. a small 1D or 2D array of what you would expect as the output? As your question is currently, it is not clear what you're asking.Vonvona
Welcome to StackOverflow. Your question would be clearer if you provide a simple input and desired output.Economize
It's still not particularly clear, since your example data is sorted. What should happen for array([[4, 1], [3, 2]])?Consubstantial
Ah! You're right. I went off your example this time.Renz
There is an extra square bracket in your example. Is your input actually 1D or 2D?Woodworth
@Renz How does [4, 1, 3, 2] become [10, 1, 6, 3]? I don't see the pattern there.Chimp
@WarrenWeckesser It's a raster so i assume the real data would be 2DChimp
Correct, it's 2D. The pattern is: for each value, sum all values over entire array that are less than or equal it. Write those values to a new array.Renz
something like [sum(x for x in arr if x <= a) for a in arr] ?Thrower
S
11

Depending on how you want to handle repeats, this could work:

In [40]: a
Out[40]: array([4, 4, 2, 1, 0, 3, 3, 1, 0, 2])

In [41]: a_unq, a_inv = np.unique(a, return_inverse=True)

In [42]: a_cnt = np.bincount(a_inv)

In [44]: np.cumsum(a_unq * a_cnt)[a_inv]
Out[44]: array([20, 20,  6,  2,  0, 12, 12,  2,  0,  6], dtype=int64)

Where of course a is your array flattened, that you would then have to reshape to the original shape.


And of course once numpy 1.9 is out, you can condense lines 41 and 42 above into the single, faster:

a_unq, a_inv, a_cnt = np.unique(a, return_inverse=True, return_counts=True)
Seditious answered 14/5, 2014 at 20:2 Comment(4)
Heh, I just worked out exactly the same answer. Figures @Jaime, the master of np.unique, would get there first.Woodworth
I have a feeling that this a little cleaner and more efficient than my master-piece :).Fanaticism
Very nice code. I'm still working out whether I can successfully run it. Thanks for your help!Renz
Can you post your exact code, and the full traceback of the error somewhere? I have just rerun the code above and haven't seen any issue, can you double check there is no typo or missing parenthesis in your code?Seditious
F
1

Edit:

This is ugly, but I think it finally works:

import numpy as np

def cond_cum_sum(my_array):
    my_list = []
    prev = -np.inf
    prev_sum = 0
    for ele in my_array:
        if prev != ele:
            prev_sum += ele
        my_list.append(prev_sum)
        prev = ele
    return np.array(my_list)

a = np.array([[4,2,2,3],
              [9,0,5,2]], dtype=np.float32)

flat_a = a.flatten()
flat_a.sort() 

temp = np.argsort(a.ravel())   

cum_sums = cond_cum_sum(flat_a)

result_1 = np.zeros(len(flat_a))
result_1[temp] = cum_sums

result = result_1.reshape(a.shape)

Result:

>>> result
array([[  9.,   2.,   2.,   5.],
       [ 23.,   0.,  14.,   2.]])
Fanaticism answered 14/5, 2014 at 19:10 Comment(11)
Thanks for the input. I believe my example was misleading, since I arbitrarily ranked the data. The actual data would not be ranked.Renz
@Fanaticism Thought along the same lines, but i think this fails if any values are equal.Chimp
@M4rtini, Yes it does not work if any values are equal. If I can't come up with something fast I'll delete my answer.Fanaticism
shouldn't it be [11, 4, 4, 7]? or did I missed the part where only unique values must be summed?Thrower
@njzk2, Where did OP say that only unique values must be summed? My understanding was that each value would be only added once.Fanaticism
@Akavall: my formulation is wrong, but my meaning is the same as yours. I don't see where each value must be added only once.Thrower
@njzk2, we only add if the next value is larger, so [0,2,2,3] would be [0,2,2,5], we don't add the second 2 because it is not larger than the previous value.Fanaticism
@Akavall: I read all values that are less than or equal to. One way or the other, my feeling is that this question lacks precision and leaves to much room for interpretationThrower
@njzk2, I totally agree about lack of precision, which is a shame because this is an interesting question no matter how duplicates are handled.Fanaticism
Sorry for the lack of precision. Essentially, what I'm trying to do is sum all values across the entire raster (which has been converted to a 2D array) that are less than or equal to the value/pixel in question. Whether or not they are unique values. I hope that clarifies it. I then need to scale all the values between 1 and 100, but that's a different question.Renz
@Vergentorix, I still don't think that I am clear. Can you provide a small complete example of your input and desired output? Providing a small example is generally a good idea; it helps people understand your question better.Fanaticism
T
1

With less numpy and more python:

a = np.array([[4,2,2,3],
              [9,0,5,2]], dtype=np.float32)

np.array([[sum(x for x in arr if x <= subarr) for subarr in arr] for arr in a])
# array([[ 11.,   4.,   4.,   7.],
#        [ 16.,   0.,   7.,   2.]])

If the sum only considers items once no matter how much they appear, then,

np.array([[sum(set(x for x in arr if x <= subarr)) for subarr in arr] for arr in a]) 
# array([[  9.,   2.,   2.,   5.],
#        [ 16.,   0.,   7.,   2.]])
Thrower answered 14/5, 2014 at 20:16 Comment(1)
My understanding (maybe incorrect) was that OP was interested in cumsum over the entire array, not the rows separately.Fanaticism
D
1

How about this:

a=np.array([ 4,  1  ,  3   ,  2])

np.array([np.sum(a[a<=x])for x in a])

Gives

array([10,  1,  6,  3])

For a 2D array (assuming you want the sum of the whole array, not just the row):

a=np.array([[ 4,  1  ,  3   ,  2],[ 5,  1  ,  3   ,  2]])

np.array([[np.sum(a[a<=x])for x in a[y,:]]for y in range(a.shape[0])])

Gvies

array([[16,  2, 12,  6],
       [21,  2, 12,  6]])
Downs answered 14/5, 2014 at 21:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.