I was looking up something similar and wrote a dumb answer here that got deleted. I had some ideas but didn't really write them properly. The deletion gave me that internet bruised ego pride so I decided to try out the problem and I think it worked!
Actually trying to do a real locate a la Adam Davis' answer is very difficult but doing a human style location (looking at the first source, ignoring echos, or treating them as sources) is not too bad, I think, though I'm not a signal processing expert by any means.
I read this and this. Which made me realise that the problem is really one of finding the time shift (cross-correlation) between two signals. From there you would calculate the angle using the speed of sound. Note you'll get two solutions (front and back).
The key information I read was in this answer and others on the same page which talk about how to do fast fourier transforms in scipy, to find the cross-correlation curve.
Basically, you need to import the wave file into python. See this.
If your wave file (input) is a tuple with two numpy arrays (left, right), zero-padded at least as long as itself (to stop it circularly aligning apparently) the code follows from Gustavo's answer. I think you need to recognise that that ffts make the assumption of time-invariance, which means if you want to get any kind of time-based tracking of signals you need to 'bite off' small samples of data.
I brought the following code together from the mentioned sources. It will produce a graph showing estimated time delay, in frames, from left to right (negative/positive). To convert to actual time, divide by the sample rate. If you want to know what the angle is you need to:
- assume everything is on a plane (no height factor)
- forget the difference between sound in front and those behind (you can't differentiate)
You would also want to use the distance between the two microphones to make sure you aren't getting echos (time delays greater than that for the 90 degree delay).
I realise that I've taken a lot of borrowed here, so thanks to all of those that inadvertently contributed!
import wave
import struct
from numpy import array, concatenate, argmax
from numpy import abs as nabs
from scipy.signal import fftconvolve
from matplotlib.pyplot import plot, show
from math import log
def crossco(wav):
"""Returns cross correlation function of the left and right audio. It
uses a convolution of left with the right reversed which is the
equivalent of a cross-correlation.
"""
cor = nabs(fftconvolve(wav[0],wav[1][::-1]))
return cor
def trackTD(fname, width, chunksize=5000):
track = []
#opens the wave file using pythons built-in wave library
wav = wave.open(fname, 'r')
#get the info from the file, this is kind of ugly and non-PEPish
(nchannels, sampwidth, framerate, nframes, comptype, compname) = wav.getparams ()
#only loop while you have enough whole chunks left in the wave
while wav.tell() < int(nframes/nchannels)-chunksize:
#read the audio frames as asequence of bytes
frames = wav.readframes(int(chunksize)*nchannels)
#construct a list out of that sequence
out = struct.unpack_from("%dh" % (chunksize * nchannels), frames)
# Convert 2 channels to numpy arrays
if nchannels == 2:
#the left channel is the 0th and even numbered elements
left = array (list (out[0::2]))
#the right is all the odd elements
right = array (list (out[1::2]))
else:
left = array (out)
right = left
#zero pad each channel with zeroes as long as the source
left = concatenate((left,[0]*chunksize))
right = concatenate((right,[0]*chunksize))
chunk = (left, right)
#if the volume is very low (800 or less), assume 0 degrees
if abs(max(left)) < 800 :
a = 0.0
else:
#otherwise computing how many frames delay there are in this chunk
cor = argmax(crossco(chunk)) - chunksize*2
#calculate the time
t = cor/framerate
#get the distance assuming v = 340m/s sina=(t*v)/width
sina = t*340/width
a = asin(sina) * 180/(3.14159)
#add the last angle delay value to a list
track.append(a)
#plot the list
plot(track)
show()
I tried this out using some stereo audio I found at equilogy. I used the car example (stereo file). It produced this.
To do this on-the-fly, I guess you'd need to have an incoming stereo source that you could 'listen to' for a short time (I used 1000 frames = 0.0208s) and then calculate and repeat.
[edit: found you can easily use the fft convolve function, using the inverted time series of one of the two to make a correlation]