I have a Python script using the speech_recognition package to recognize speech and return the text of what was spoken. The transcription has a few seconds delay, however. Is there another way to write this script to return each word as it is spoken? I have another script to do this, using the pysphinx package, but the results are wildly inaccurate.
Install dependencies:
pip install SpeechRecognition
pip install pocketsphinx
Script 1 - Delayed speech-to-text:
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Please wait. Calibrating microphone...")
# listen for 5 seconds and create the ambient noise energy level
r.adjust_for_ambient_noise(source, duration=5)
print("Say something!")
audio = r.listen(source)
# recognize speech using Sphinx
try:
print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
Script 2 - Immediate albeit inaccurate speech-to-text:
import os
from pocketsphinx import LiveSpeech, get_model_path
model_path = get_model_path()
speech = LiveSpeech(
verbose=False,
sampling_rate=16000,
buffer_size=2048,
no_search=False,
full_utt=False,
hmm=os.path.join(model_path, 'en-us'),
lm=os.path.join(model_path, 'en-us.lm.bin'),
dic=os.path.join(model_path, 'cmudict-en-us.dict')
)
for phrase in speech:
print(phrase)