How would I use pyaudio to detect a sudden tapping noise from a live microphone?
Detect tap with pyaudio from live mic
Asked Answered
One way I've done it:
- read a block of samples at a time, say 0.05 seconds worth
- compute the RMS amplitude of the block (square root of the average of the squares of the individual samples)
- if the block's RMS amplitude is greater than a threshold, it's a "noisy block" else it's a "quiet block"
- a sudden tap would be a quiet block followed by a small number of noisy blocks followed by a quiet block
- if you never get a quiet block, your threshold is too low
- if you never get a noisy block, your threshold is too high
My application was recording "interesting" noises unattended, so it would record as long as there were noisy blocks. It would multiply the threshold by 1.1 if there was a 15-second noisy period ("covering its ears") and multiply the threshold by 0.9 if there was a 15-minute quiet period ("listening harder"). Your application will have different needs.
Also, just noticed some comments in my code regarding observed RMS values. On the built in mic on a Macbook Pro, with +/- 1.0 normalized audio data range, with input volume set to max, some data points:
- 0.003-0.006 (-50dB to -44dB) an obnoxiously loud central heating fan in my house
- 0.010-0.40 (-40dB to -8dB) typing on the same laptop
- 0.10 (-20dB) snapping fingers softly at 1' distance
- 0.60 (-4.4dB) snapping fingers loudly at 1'
Update: here's a sample to get you started.
#!/usr/bin/python
# open a microphone in pyAudio and listen for taps
import pyaudio
import struct
import math
INITIAL_TAP_THRESHOLD = 0.010
FORMAT = pyaudio.paInt16
SHORT_NORMALIZE = (1.0/32768.0)
CHANNELS = 2
RATE = 44100
INPUT_BLOCK_TIME = 0.05
INPUT_FRAMES_PER_BLOCK = int(RATE*INPUT_BLOCK_TIME)
# if we get this many noisy blocks in a row, increase the threshold
OVERSENSITIVE = 15.0/INPUT_BLOCK_TIME
# if we get this many quiet blocks in a row, decrease the threshold
UNDERSENSITIVE = 120.0/INPUT_BLOCK_TIME
# if the noise was longer than this many blocks, it's not a 'tap'
MAX_TAP_BLOCKS = 0.15/INPUT_BLOCK_TIME
def get_rms( block ):
# RMS amplitude is defined as the square root of the
# mean over time of the square of the amplitude.
# so we need to convert this string of bytes into
# a string of 16-bit samples...
# we will get one short out for each
# two chars in the string.
count = len(block)/2
format = "%dh"%(count)
shorts = struct.unpack( format, block )
# iterate over the block.
sum_squares = 0.0
for sample in shorts:
# sample is a signed short in +/- 32768.
# normalize it to 1.0
n = sample * SHORT_NORMALIZE
sum_squares += n*n
return math.sqrt( sum_squares / count )
class TapTester(object):
def __init__(self):
self.pa = pyaudio.PyAudio()
self.stream = self.open_mic_stream()
self.tap_threshold = INITIAL_TAP_THRESHOLD
self.noisycount = MAX_TAP_BLOCKS+1
self.quietcount = 0
self.errorcount = 0
def stop(self):
self.stream.close()
def find_input_device(self):
device_index = None
for i in range( self.pa.get_device_count() ):
devinfo = self.pa.get_device_info_by_index(i)
print( "Device %d: %s"%(i,devinfo["name"]) )
for keyword in ["mic","input"]:
if keyword in devinfo["name"].lower():
print( "Found an input: device %d - %s"%(i,devinfo["name"]) )
device_index = i
return device_index
if device_index == None:
print( "No preferred input found; using default input device." )
return device_index
def open_mic_stream( self ):
device_index = self.find_input_device()
stream = self.pa.open( format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
input_device_index = device_index,
frames_per_buffer = INPUT_FRAMES_PER_BLOCK)
return stream
def tapDetected(self):
print("Tap!")
def listen(self):
try:
block = self.stream.read(INPUT_FRAMES_PER_BLOCK)
except IOError as e:
# dammit.
self.errorcount += 1
print( "(%d) Error recording: %s"%(self.errorcount,e) )
self.noisycount = 1
return
amplitude = get_rms( block )
if amplitude > self.tap_threshold:
# noisy block
self.quietcount = 0
self.noisycount += 1
if self.noisycount > OVERSENSITIVE:
# turn down the sensitivity
self.tap_threshold *= 1.1
else:
# quiet block.
if 1 <= self.noisycount <= MAX_TAP_BLOCKS:
self.tapDetected()
self.noisycount = 0
self.quietcount += 1
if self.quietcount > UNDERSENSITIVE:
# turn up the sensitivity
self.tap_threshold *= 0.9
if __name__ == "__main__":
tt = TapTester()
for i in range(1000):
tt.listen()
Could you post a simple code sample? I have never worked with audio before. –
Theomancy
thanks a lottt!! this helps me tonnes !! very informative. Although, is it possible to take the whole automatic threshold concept out of it and manually calibrate it? Like for example, If i record taps, noise, snaps, claps in mic and see it in software, the sound clearly has a level of up to -12 dB while taps are much larger than -12 dB or more like 0 dB or even higher. So I want to set my threshold to -12 dB. How can i do that? –
Hohenlohe
@Dhruv - just remove the logic that changes self.tap_threshold. Depending on what your '-12dB' is relative to, it might or might not correspond to a threshold of 0.25, so try initializing tap_threshold to that value instead of the 0.01 in my sample. –
Felicity
Python comes with a standard way of computing the RMS amplitude, believe it or not: audioop. You can replace the
get_rms
function above with this: def get_rms(block): return audioop.rms(block, 2)
. –
Washable Wow, I didn't know about audioop. That's some serious
import antigravity
action there. (For full compatibility there you still have to rescale it, but yeah.) –
Felicity This, as far as I can find, is the only example showing how to extract sample values from
pyaudio
. Cheers! –
Sanity a simplified version of the above code...
import pyaudio
import struct
import math
INITIAL_TAP_THRESHOLD = 0.010
FORMAT = pyaudio.paInt16
SHORT_NORMALIZE = (1.0/32768.0)
CHANNELS = 2
RATE = 44100
INPUT_BLOCK_TIME = 0.05
INPUT_FRAMES_PER_BLOCK = int(RATE*INPUT_BLOCK_TIME)
OVERSENSITIVE = 15.0/INPUT_BLOCK_TIME
UNDERSENSITIVE = 120.0/INPUT_BLOCK_TIME # if we get this many quiet blocks in a row, decrease the threshold
MAX_TAP_BLOCKS = 0.15/INPUT_BLOCK_TIME # if the noise was longer than this many blocks, it's not a 'tap'
def get_rms(block):
# RMS amplitude is defined as the square root of the
# mean over time of the square of the amplitude.
# so we need to convert this string of bytes into
# a string of 16-bit samples...
# we will get one short out for each
# two chars in the string.
count = len(block)/2
format = "%dh"%(count)
shorts = struct.unpack( format, block )
# iterate over the block.
sum_squares = 0.0
for sample in shorts:
# sample is a signed short in +/- 32768.
# normalize it to 1.0
n = sample * SHORT_NORMALIZE
sum_squares += n*n
return math.sqrt( sum_squares / count )
pa = pyaudio.PyAudio() #]
#|
stream = pa.open(format = FORMAT, #|
channels = CHANNELS, #|---- You always use this in pyaudio...
rate = RATE, #|
input = True, #|
frames_per_buffer = INPUT_FRAMES_PER_BLOCK) #]
tap_threshold = INITIAL_TAP_THRESHOLD #]
noisycount = MAX_TAP_BLOCKS+1 #|---- Variables for noise detector...
quietcount = 0 #|
errorcount = 0 #]
for i in range(1000):
try: #]
block = stream.read(INPUT_FRAMES_PER_BLOCK) #|
except IOError, e: #|---- just in case there is an error!
errorcount += 1 #|
print( "(%d) Error recording: %s"%(errorcount,e) ) #|
noisycount = 1 #]
amplitude = get_rms(block)
if amplitude > tap_threshold: # if its to loud...
quietcount = 0
noisycount += 1
if noisycount > OVERSENSITIVE:
tap_threshold *= 1.1 # turn down the sensitivity
else: # if its to quiet...
if 1 <= noisycount <= MAX_TAP_BLOCKS:
print 'tap!'
noisycount = 0
quietcount += 1
if quietcount > UNDERSENSITIVE:
tap_threshold *= 0.9 # turn up the sensitivity
Without an input_device_index in
pyaudio.PyAudio().open(... )
would you get silence or would pyaudio somehow locate a working mic? –
Procrustes © 2022 - 2024 — McMap. All rights reserved.