Currently, I use sox like this:
sox -d -e u-law --endian little -b 8 -c 1 -r 8000 -t ul - silence 1 0.3 1% 1 0.3 1%
For reference, this is recording audio from the default microphone and outputting little endian, ulaw formatted audio at 8 bits and a 8k rate. The effects filter trims audio until the noise hits a threshold for 0.3 seconds, then continues to record until there is 0.3 seconds of silence. All of this streams to stdout which I use to stream to a remote server.
I am using all of this to record a bit of voice and finish when I am done speaking. To trigger sox, I use specialized hardware to trigger the start of the recording. I can switch to using almost any audio format or codec as long as it supports on the fly formatting/encoding. My target platform is raspbian on the raspberry pi 2 B.
My ideal solution would be to use vad to stop the recording when the user is finished speaking. My hope is that this would work even with background chatter. However, the sox documentation on the vad effect states this:
The use of the norm effect is recommended, but remember that neither reverse nor norm is suitable for use with streamed audio.
I haven't been able to piece parameters together to get vad and streaming working. Is it possible to use the vad effect to stop the recording of audio while still maintaining the stdin->sox->stdout piping? Are there better alternatives?