I want to find the offset between two arrays of timestamps. They could represent, let's say, the onset of beeps in two audio tracks.
Note: There may be extra or missing onsets in either track.
I found some information about cross-correlation (e.g. https://dsp.stackexchange.com/questions/736/how-do-i-implement-cross-correlation-to-prove-two-audio-files-are-similar) which looked promising.
I assumed that each audio track is 10 seconds in duration, and represented the beep onsets as the peaks of a "square wave" with a sample rate of 44.1 kHz:
import numpy as np
rfft = np.fft.rfft
irfft = np.fft.irfft
track_1 = np.array([..., 5.2, 5.5, 7.0, ...])
# The onset in track_2 at 8.0 is "extra," it has no
# corresponding onset in track_1
track_2 = np.array([..., 7.2, 7.45, 8.0, 9.0, ...])
frequency = 44100
num_samples = 10 * frequency
wave_1 = np.zeros(num_samples)
wave_1[(track_1 * frequency).astype(int)] = 1
wave_2 = np.zeros(num_samples)
wave_2[(track_2 * frequency).astype(int)] = 1
xcor = irfft(rfft(wave_1) * np.conj(rfft(wave_2)))
offset = xcor.argmax()
This approach isn't particularly fast, but I was able to get fairly consistent results even with quite low frequencies. However... I have no idea if this is a good idea! Is there a better way to find this offset than cross-correlation?
Edit: added note about missing and extra onsets.
track_1
andtrack_2
as irregularly spaced, then you multiply them byfrequency
when buildingwave_1
andwave_2
. Istrack_1
andtrack_2
supposed to be the timestamps you're trying to correlate, or are they supposed to be the audio waveforms without the beeps added? Or are they the onset times of the "beep"? – Microprinttrack_1
andtrack_2
are onset times of each beep.wave_1
andwave_2
are each, if you like, a summation of Dirac delta functions for the purposes of finding the cross-correlation. – Bengali