numpy cross-correlation - vectorizing
Asked Answered
H

1

6

I have a large number of cross-correlations to calculate and I'm looking for the fastest way to do it. I'm assuming vectorizing the problem would help rather than doing it with loops

I have a 3D array labelled as electrode x timepoint x trial (shape: 64x256x913). I want to calculate the max cross-correlation of the timepoints for every pair of electrodes, for every trial.

Specifically: for every trial, I want to take each of the pair combination of electrodes and calculate the max cross-correlation value for every pair. That will result in 4096 (64*64) max cross-correlation values in a single row/vector. That will be done for every trial, stacking each of the rows/vectors on top of each other resulting in a final 2D array of shape 913*4096 containing max cross-correlation values

Thats a lot of computation but I want to try to find the fastest method to do it. I mocked up some proto-code using lists as containers that may help explain the problem a bit better. There may be some logic errors in there, but either way the code doesn't run on my computer because theres so much to calculate that python just freezes. Here is it:

#allData is a 64x256x913 array

all_experiment_trials = []
for trial in range(allData.shape[2]):
    all_trial_electrodes = []
    for electrode in range(allData.shape[0]):
        for other_electrode in range(allData.shape[0]):
            if electrode == other_electrode:
                pass
            else:
                single_xcorr = max(np.correlate(allData[electrode,:,trial], allData[other_electrode,:,trial], "full"))
                all_trial_electrodes.append(single_xcorr)
    all_experiment_trials.append(all_trial_electrodes)

Obviously loops are really slow for this type of thing. Is there a vectorized solution to this using numpy arrays?

I've checked out things like correlate2d() and the like, but I don't think they really work in my case as I'm not multiplying 2 matrices together

Hafiz answered 31/7, 2015 at 7:3 Comment(0)
L
3

Here's one vectorized approach based on np.einsum -

def vectorized_approach(allData):
    # Get shape
    M,N,R = allData.shape

    # Valid mask based on condition: "if electrode == other_electrode"
    valid_mask = np.mod(np.arange(M*M),M+1)!=0

    # Elementwise multiplications across all elements in axis=0, 
    # and then summation along axis=1
    out = np.einsum('ijkl,ijkl->lij',allData[:,None,:,:],allData[None,:,:,:])

    # Use valid mask to skip columns and have the final output
    return out.reshape(R,-1)[:,valid_mask]

Runtime test and verify results -

In [10]: allData = np.random.rand(20,80,200)

In [11]: def org_approach(allData):
    ...:     all_experiment_trials = []
    ...:     for trial in range(allData.shape[2]):
    ...:         all_trial_electrodes = []
    ...:         for electrode in range(allData.shape[0]):
    ...:             for other_electrode in range(allData.shape[0]):
    ...:                 if electrode == other_electrode:
    ...:                     pass
    ...:                 else:
    ...:                     single_xcorr = max(np.correlate(allData[electrode,:,trial], allData[other_electrode,:,trial]))
    ...:                     all_trial_electrodes.append(single_xcorr)
    ...:         all_experiment_trials.append(all_trial_electrodes)
    ...:     return all_experiment_trials
    ...: 

In [12]: %timeit org_approach(allData)
1 loops, best of 3: 1.04 s per loop

In [13]: %timeit vectorized_approach(allData)
100 loops, best of 3: 15.1 ms per loop

In [14]: np.allclose(vectorized_approach(allData),np.asarray(org_approach(allData)))
Out[14]: True
Loveless answered 31/7, 2015 at 8:53 Comment(5)
I'll have to take a closer look at this tomorrow - I've never seen this form of notation before! But from what I've just (briefly) read, the einstein summation method can convey sum and multiplication operations, which is fine for a 0 lag cross-correlation, but there is also a sliding window element as well...how does the function realign the vectors?Hafiz
@Nem Since you are performing correlation between two equal-length 1D arrays, so according to the implementation of np.correlate, it doesn't have the sliding "freedom". So, essentially it's elementwise multiplication and addition along the entire length. That is basically "abused" here with np.einsum. This is all based upon the code listed in the question. Hope that made sense though!Loveless
ah you're right, my code above is incorrect. What I'm seeking is a cross-correlation between 2 vectors. So for any pair of electrodes, I need to shift one vector in time, do the calculation you suggested, shift again, recalculate. Which will return a vector containing correlations at every possible time shift. I then need just the max of these correlation values. It is this time shifting that is giving me a headache when trying to figure out how to vectorize. This does a better job of explaining: #6285176Hafiz
@Nem Could you edit your question and the codes to reflect all of that, so that I could verify with those?Loveless
fixed. added "full" to np.correlate(). My code still contains a lot of double calculations though (e.g. we cross correlate 1 with 2, and then 2 with 1 later, which are the same thing), so I'll need to figure out a way to remove those later...Hafiz

© 2022 - 2024 — McMap. All rights reserved.