I'm looking to log events corresponding to a specific sound, such as a car door slamming, or perhaps a toaster ejecting toast.
The system needs to be more sophisticated than a "loud noise detector"; it needs to be able to distinguish that specific sound from other loud noises.
The identification need not be zero-latency, but the processor needs to keep up with a continuous stream of incoming data from a microphone that is always on.
- Is this task significantly different than speech recognition, or could I make use of speech recognition libraries/toolkits to identify these non-speech sounds?
- Given the requirement that I only need to match one sound (as opposed to matching among a library of sounds), are there any special optimizations I can do?
This answer indicates that a matched filter would be appropriate, but I am hazy on the details. I don't believe a simple cross-correlation on the audio waveform data between a sample of the target sound and the microphone stream would be effective, due to variations in the target sound.
My question is also similar to this, which didn't get much attention.