Similarity between two signals: looking for simple measure

Asked 17/12, 2013 at 21:2 Answered 2/7, 2021 at 3:24

I have 20 signals (time-courses) in group A and 20 signals in group B. I want to find a measure to show that group A is different from group B. For example, I ran xcorr for the signals within each group. But now I need to compare them somehow. I tried to take a maximal amplitude of each xcorr pair, which is sort a measure of maximal similarity. Then I compared all these values between two groups, but there was no difference. What else can I do? I can also compare frequency spectrum, but then I again do not know what frequency bin to take. Any suggestions / references are highly appreciated!

I have about 20 signals in each group. Those are my samples. I do not know a-prirori what might be the difference. Here I bring the 9 sample signals for each group, their auto-correlation and cross-correlation for a subset of signals (group 1 vs. group 1, group 2 vs. group 2, group 1 vs. group 2). I do not see any evident difference. I also do not understand how you propose to compare cross-correlations, what peaks should I take? All the signals were detrended and z-scored.

enter image description here

Delftware answered 17/12, 2013 at 21:2 Comment(6)

Could you give us an idea of the number of samples in each waveform? Is there any obvious periodicity? The autocorrelation of a signal with itself (power spectrum) is usually a good indication of the kinds of signals that are present. And the cross correlation between different signals in A (which are "similar") may be higher than the corresponding AB correlations. In what way do you expect them to differ? Ultimately the test should be: take two random signals (may be from A, may be from B). Perform test. If test value < something -> Same, else -> different? Show statistical difference. – Kaye 17/12, 2013 at 21:9

Here I added the details to my original post. – Delftware 18/12, 2013 at 15:26

You said you detrended and z-scored them. Perhaps the slope of the trend line, or the mean and variance of the signal are what you need. Right now they all just look like noise. – Capel 18/12, 2013 at 17:39

I must to detrend because the trend is introduced by the measurement device. When I compare simple std between two groups it does not differ either. Clearly, the signals are similar - otherwise I would have been too simple:) The question what else measures I can try? – Delftware 18/12, 2013 at 19:5

I think your images are labeled incorrectly? I don't think the asymmetric signals should be the autocorrelation. – Sniper 18/12, 2013 at 22:29

wavelet (spectral) decomposition to break into approximation and details levels. Extract A x cos(wt+phi) coefficients of wave function? Deconvolve wavefunction with signals? Look into lombscargle periodogram. Also check out geophysics/seismology books - it covers this – Salina 5/4 at 3:20

Well, this may be too simplistic of an answer, and too complex of a measure, but maybe its worth something.

In order to compare signals, we really have to establish some criterion by which we compare them. This could be so many things. If we want signals that look visually similar, we perform time domain analysis. If we are talking about audio signals that sound similar, we care about frequency or time-frequency analysis. If the signals are supposed to represent noise, then signal variance should be a good measure. In general we may want to use a combination of all sorts of measures. We can do this with a weighted index.

First let's establish what we have: there are two sets of signals: set A and set B. We want some measure that shows set A is different from set B. The signals are detrended.

We take signal a in A and signal b in B. The list of things we can compare:

Similarity in time domain (static): Multiply in place and sum.
Similarity in time domain (with shift*): Take fft of each signal, multiply, and ifft. (I believe this equivalent to matlab's xcorr.)
Similarity in frequency domain (static**): Take fft of each signal, multiply, and sum.
Similarity in frequency domain (with shift*): Multiply the two signals and take fft. This will show if the signals share similar spectral shapes.
Similarity in energy (or power if different lengths): Square the two signals and sum each (and divide by signal length for power). (Since the signals were detrended, this should be signal variance.) Then subtract and take absolute value for a measure of signal variance similarity.

* (with shift) -- You could choose to sum over the entire correlation vector to measure total general correlation, you could choose to sum only values in the correlation vector that surpass a certain threshold value (as if you expect echoes of one signal in the other), or just take the maximum value from the correlation vector (where its index is the shift in the second signal that results in maximal correlation with the first signal). Also, if the amount of shift that it takes to reach maximal correlation is important (i.e. if signals are similar only if it takes relatively small shift to reach the point of maximal correlation), then you can incorporate a measure of the index displacement.

** (frequency domain similarity) -- You may want to mask part of the spectrum that you're not concerned with, for instance, if you only care about the more high frequency structures (fs/4 and up), you could do:

mask = zeros(1,n); mask(n/4):
freq_static = mean(fft(a) .* fft(b) .* mask);

Also, we may want to implement a circular correlation like so:

function c = circular_xcorr(a,b)
c = xcorr(a,b);
mid = length(c) / 2;
c = c(1:mid) + c(mid+1:end);
end

Finally, we choose the characteristics that are important or relevant, and create a weighted index. Example:

n = 100;
a = rand(1,n); b = rand(1,n);
time_corr_thresh = .8 * n; freq_corr_thresh = .6 * n;
time_static = max(a .* b);
time_shifted = circular_xcorr(a,b);    time_shifted = sum(time_shifted(time_shifted > time_corr_thresh));
freq_static = max(fft(a) .* fft(b));
freq_shifted = fft(a .* b);     freq_shifted = sum(freq_shifted(freq_shifted > freq_corr_thresh));
w1 = 0; w2 = 1; w2 = .7; w3 = 0;
index = w1 * time_static + w1 * time_shifted + w2 * freq_static + w3 * freq_shifted;

We compute this index for each pair of signals.

I hope that this outline of signal characterization helps. Comment if anything is unclear.

Sniper answered 19/12, 2013 at 2:24 Comment(5)

Thank you a lot for this answer! I will work on it and will update you here. Just with regard to general framework. As far as I understand there are three possible ways to establish the effect: 1. To compare all possible pairs within each group and using t-test to show that my similarity measure differ significantly between two groups. 2. To compare all possible pairs within each group and between two groups and to show that my similarity measure within group is larger than between groups. – Delftware 19/12, 2013 at 8:30

3. If for each signal I have two measurements (two examplars), so I can calculate the similarity between these two examplars for each sample of each group and then compare the result between groups using t-test (like in [1]). This result will give me also a measure of reproducibility. Makes sense? – Delftware 19/12, 2013 at 8:31

Oh, yeah! I think that will work well. I'm glad you asked this because I might do a similar group similarity test for my scintillation research soon :) – Sniper 19/12, 2013 at 17:51

This post is ancient. Are you still following it? The approach you describe is very close to what I need to do, but I am less familiar with the signal analysis and with Matlab than either of you. By any chance, can you help out and offer me a link to a more complete explanation of how to weight cross correlation, and frequencies from the FFTs, preferably with some code in Matlab, R or python? – Bolt 20/1, 2019 at 14:43

@JohnStrong did you ever get anywhere with this? – Penang 26/2, 2019 at 16:53

With reference to Brian's answer above, I've written a Python Function to compute the similarity of time-series signal as below;

def compute_similarity(ref_rec,input_rec,weightage=[0.33,0.33,0.33]):
    ## Time domain similarity
    ref_time = np.correlate(ref_rec,ref_rec)
    inp_time = np.correlate(ref_rec,input_rec)
    diff_time = abs(ref_time-inp_time)
    
    ## Freq domain similarity
    ref_freq = np.correlate(np.fft.fft(ref_rec),np.fft.fft(ref_rec)) 
    inp_freq = np.correlate(np.fft.fft(ref_rec),np.fft.fft(input_rec))
    diff_freq = abs(ref_freq-inp_freq)
    
    ## Power similarity
    ref_power = np.sum(ref_rec**2)
    inp_power = np.sum(input_rec**2)
    diff_power = abs(ref_power-inp_power)
    
    return float(weightage[0]*diff_time+weightage[1]*diff_freq+weightage[2]*diff_power)

Delorenzo answered 2/7, 2021 at 3:24 Comment(0)

Recommended topics

Hot tags