I am trying to implement a series of statistical operations, and I need help vectorizing my code.
The idea is to extract NxN
patches from two images, the compute a distance metric between these two patches.
To do this, first I construct the patches with the following loops:
params = []
for i in range(0,patch1.shape[0],1):
for j in range(0,patch1.shape[1],1):
window1 = np.copy(imga[i:i+N,j:j+N]).flatten()
window2 = np.copy(imgb[i:i+N,j:j+N]).flatten()
params.append((window1, window2))
print(f"We took {time()- t0:2.2f} seconds to prepare {len(params)/1e6} million patches.")
This takes about 10 seconds to complete, and I'm not overly concerned with the pre-processing time. The steps below that follow are the steps I want to optimize.
After this, in an attempt to speed up processing I used multipool to compute the actual results. The function that contains the actual computation is as follows:
@njit
def cauchy_schwartz(imga, imgb):
p, _ = np.histogram(imga, bins=10)
p = p/np.sum(p)
q, _ = np.histogram(imgb, bins=10)
q = q/np.sum(q)
n_d = np.array(np.sum(p * q))
d_d = np.array(np.sum(np.power(p, 2) * np.power(q, 2)))
return -1.0 * np.log10( n_d, d_d)
I use this structure to process all the patches:
def f(param):
return cauchy_schwartz(*param)
with Pool(4) as p:
r = list(tqdm.tqdm(p.imap(f,params), total=len(params)))
I am sure there must be something much more elegant to do this, because if I send the whole 10Kpx by 10Kpx images to the cauchy_schwartz
function it processes everything in under a second, but with my approach, even on 4 cores it takes a long time.
My mental model is how blockproc
in matlab works - and I ended up writing this code in that pattern. I would appreciate any advice on improving the performance of this code.
random.rand(3,2,100)
. Let me experiment and think and try to understand this a little. But if you could help explain - it would save me guessing. In your solution is the N (window size) 3, depth 2 (for the 2 patches) and 100 samples/blocks? – Prebendary