Voice Activity Detection from mic input on iOS
Asked Answered
P

1

9

I'm developing an iOS app that does voice based AI; i.e. it's meant to take voice input from the microphone, turn it into text, send it to an AI agent, then output the returned text through the speaker. I've got everything working, though using a button to start and stop recording the speech (SpeechKit for voice recognition, API.AI for the AI, Amazon's Polly for the output).

The piece that I need is to have the microphone always on and to automatically start and stop the recording of the user's voice as they begin and end talking. This app is being developed for an unorthodox context, where there will be no access to the screen for the user (but they will have a high-end shotgun mic for recording their text).

My research suggests this piece of the puzzle is known as 'Voice Activity Detection' and seems to be one of the hardest steps in the whole voice-based AI system.

I'm hoping someone can either supply some straightforward (Swift) code to implement this myself, or point me in the direction of some decent libraries / SDKs that I can implement in this project.

Pepper answered 6/8, 2017 at 5:38 Comment(0)
S
2

For good VAD algorithm implementation you can use py-webrtcvad.

It is a Python interface for C code, you can just import C files from the project and use them from swift.

Shewchuk answered 7/8, 2017 at 16:6 Comment(2)
Thanks! I've actually already got my hands on an iOS port of that library, but haven't yet quite worked out how to apply it to buffers coming in off the mic, rather than just pointing it at an existing audio file... Any hints? Code samples?Pepper
The API processed frame by frame, so there should not be a problem to process buffers.Shewchuk

© 2022 - 2024 — McMap. All rights reserved.