I need to do voice activity detection as a step to classify audio files.
Basically, I need to know with certainty if a given audio has spoken language.
I am using py-webrtcvad, which I found in git-hub and is scarcely documented:
https://github.com/wiseman/py-webrtcvad
Thing is, when I try it on my own audio files, it works fine with the ones that have speech but keeps yielding false positives when I feed it with other types of audio (like music or bird sound), even if I set aggressiveness at 3.
Audios are 8000 sample/hz
The only thing I changed to the source code was the way I pass the arguments to main function (excluding sys.args).
def main(file, agresividad):
audio, sample_rate = read_wave(file)
vad = webrtcvad.Vad(int(agresividad))
frames = frame_generator(30, audio, sample_rate)
frames = list(frames)
segments = vad_collector(sample_rate, 30, 300, vad, frames)
for i, segment in enumerate(segments):
path = 'chunk-%002d.wav' % (i,)
print(' Writing %s' % (path,))
write_wave(path, segment, sample_rate)
if __name__ == '__main__':
file = 'myfilename.wav'
agresividad = 3 #aggressiveness
main(file, agresividad)