Any simple VAD implementation?

A

3

17

I'm looking for some C/C++ code for VAD (Voice Activity Detection).

Basically, my application is reading PCM frames from the device. I would like to know when the user is talking. I'm not looking for any speech recognition algorithm but only for voice detection.

I would like to know when the user is talking and when he finishes:

bool isVAD(short* pcm,size_t count);

Afloat answered 20/3, 2011 at 7:7 Comment(0)

R

8

There are open source implementations in the Sphinx and Freeswitch projects. I think they are all energy based detectors do won't need any kind model.

Sphinx 4 (Java but it should be easy to port to C/C++)

PocketSphinx

Freeswitch

Resurgent answered 20/3, 2011 at 8:4 Comment(1)

Thanks for the links, the Freeswitch seems like the better approach for me (C), but it still seems like it is tied into a more complex framework. I've found a python implementation (github.com/shriphani/Listener/blob/master/VAD.py) which seems simpler, but since my python skill are below average, I'm still looking for a C/C++ implementation. – Afloat 23/3, 2011 at 6:7

R

25

Google's open-source WebRTC code has a VAD module written in C. It uses a Gaussian Mixture Model (GMM), which is typically much more effective than a simple energy-threshold detector, especially in a situation with dynamic levels and types of background noise. In my experience it's also much more effective than the Moattar-Homayounpour VAD that Gilad mentions in their comment.

The VAD code is part of the much, much larger WebRTC repository, but it's very easy to pull it out and compile it on its own. E.g. the webrtcvad Python wrapper includes just the VAD C source.

The WebRTC VAD API is very easy to use. First, the audio must be mono 16 bit PCM, with either a 8 KHz, 16 KHz or 32 KHz sample rate. Each frame of audio that you send to the VAD must be 10, 20 or 30 milliseconds long.

Here's an outline of an example that assumes audio_frame is 10 ms (320 bytes) of audio at 16000 Hz:

#include "webrtc/common_audio/vad/include/webrtc_vad.h"
// ...
VadInst *vad;
WebRtcVad_Create(&vad);
WebRtcVad_Init(vad);
int is_voiced = WebRtcVad_Process(vad, 16000, audio_frame, 160);

Rapparee answered 24/4, 2016 at 17:29 Comment(8)

I port it to iOS, but I don't know why I put anything(different audio, have a lot of background, and voices), it will return 1. – Bondswoman 18/9, 2016 at 11:0

My problem is solved, turn out that this code only work on little-endian LPCM, change to that and everything is done! Thanks. – Bondswoman 20/9, 2016 at 17:15

@Bondswoman have you posted your iOS port anywhere? – Duct 31/1, 2017 at 6:47

WebRTC VAD works not really how it should - it detects loud noises as voice as well. I recorded a sound of a book dropped on a table and it detected it as a voice. Maybe you know how to workaround this? I'd like to tune it somehow to a real voice - maybe using vowel sound frequency, or maybe some other way. Any thoughts? – Boehm 17/11, 2017 at 15:9

The WebRTC VAD is designed to detect real voices, but sometimes it needs time/examples to adapt. Try talking to it first, then dropping a book and see if it classifies the drop noise as voice. – Rapparee 20/11, 2017 at 20:50

+1 to the little endian point. Make your recordFormat: kAudioFormatLinearPCM and mFormatFlags &= ~kLinearPCMFormatFlagIsBigEndian then set your flags as normal mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked. Finally, use a little endian audio format. WAVE works great. – Urbanity 15/1, 2019 at 5:36

@JohnWiseman are you saying the GMM keeps state between each call to WebRtcVad_Process? for how long does state have an effect? (doesn't that make it rather useless for finding the initial activation?) – Offering 11/4, 2019 at 18:52

Hi John I am adding webRTC VAD module from:- github.com/wiseman/py-webrtcvad. Now I am getting a compilation issue in my existing application. I am using iOS 12+ my application in objective-c. Please suggest if any sample code you have written for iOS to use. – Hiles 9/10, 2020 at 12:9

R

8

There are open source implementations in the Sphinx and Freeswitch projects. I think they are all energy based detectors do won't need any kind model.

Sphinx 4 (Java but it should be easy to port to C/C++)

PocketSphinx

Freeswitch

Resurgent answered 20/3, 2011 at 8:4 Comment(1)

Thanks for the links, the Freeswitch seems like the better approach for me (C), but it still seems like it is tied into a more complex framework. I've found a python implementation (github.com/shriphani/Listener/blob/master/VAD.py) which seems simpler, but since my python skill are below average, I'm still looking for a C/C++ implementation. – Afloat 23/3, 2011 at 6:7

E

1

How about LibVAD? www.libvad.com

Seems like that does exactly what you're describing.

Disclosure: I'm the developer behind LibVAD

Ex answered 15/1, 2015 at 16:0 Comment(2)

Charles, I've tried to get in touch with you but no luck so far. – Sponge 4/9, 2017 at 14:28

@Ex The site www.libvad.com is down for a long time ? – Retriever 21/10, 2019 at 17:15

Recommended topics

Hot tags