Voice Activity Detection from mic input on iOS

I'm developing an iOS app that does voice based AI; i.e. it's meant to take voice input from the microphone, turn it into text, send it to an AI agent, then output the returned text through the speaker. I've got everything working, though using a button to start and stop recording the speech (SpeechKit for voice recognition, API.AI for the AI, Amazon's Polly for the output).

The piece that I need is to have the microphone always on and to automatically start and stop the recording of the user's voice as they begin and end talking. This app is being developed for an unorthodox context, where there will be no access to the screen for the user (but they will have a high-end shotgun mic for recording their text).

My research suggests this piece of the puzzle is known as 'Voice Activity Detection' and seems to be one of the hardest steps in the whole voice-based AI system.

I'm hoping someone can either supply some straightforward (Swift) code to implement this myself, or point me in the direction of some decent libraries / SDKs that I can implement in this project.

Recommended topics

Hot tags