Convert voice to text while talking in python
Asked Answered
W

2

7

I made a program which allows me to speak and converts it to a text. It converts my voice after I stopped talking. What I want to do is to convert my voice to text while I am talking.

https://www.youtube.com/watch?v=96AO6L9qp2U&t=2s&ab_channel=StormHack at min 2:31.

Pay attention to top right corner of Tony's monitor. It converts his voice to text while talking. I want to do the same thing. Can it be done?

This is my whole program:

import speech_recognition as sr 
import pyaudio


r = sr.Recognizer()
with sr.Microphone() as source:
    print("Listening...")
    audio = r.listen(source)
    try:
        text = r.recognize_google(audio)
        print("You said : {}".format(text))
    except:
        print("Sorry could not recognize what you said")

solution, tips, hints, or anything would be greatly appreciated, thank you in advance.

Walrath answered 11/12, 2018 at 4:42 Comment(1)
I assume you have looked through the docs here: github.com/Uberi/speech_recognitionMcnutt
B
1

In order to do this you will have to do what's called VAD: Voice Audio Detection, a simple way to do this is take a set of samples from the audio and grab their intensity, if they are above a certain threshold then you should begin recording, once the intensity falls below a certain threshold for a given period of time then you conclude the recording and send it off to the service. You can find an example of this here.

More complex systems use better heuristics to decide whether or not the user is speaking, such as the frequency as well as applying things like noise reduction, other systems are also able to perform live speech to text as the user is speaking like DeepSpeech 2.

Baerman answered 16/5, 2019 at 15:35 Comment(2)
I appreciate you answering my question! Did you watch the video I've provided? If I do exactly what you said, will my program convert my voice into text while I am speaking?Walrath
No, but if you do what I said your program will work similar to Siri or Alexa.Baerman
A
0

To do what you want, you need to listen not to a complete sentence, but for just a few words. You then have to process the audio data and to finally print the result. Here is a very basic implementation of it:

import speech_recognition as sr
import threading
import time
from queue import Queue

listen_recognizer = sr.Recognizer()
process_recognizer = sr.Recognizer()

audios_to_process = Queue()

def callback(recognizer, audio_data):
    if audio_data:
        audios_to_process.put(audio_data)

def listen():
    source = sr.Microphone()
    stop_listening = listen_recognizer.listen_in_background(source, callback, 3)
    return stop_listening

def process_thread_func():
    while True:
        if audios_to_process.empty():
            time.sleep(2)
            continue
        audio = audios_to_process.get()
        if audio:
            try:
                text = process_recognizer.recognize_google(audio)
            except:
                pass
            else:
                print(text)

stop_listening = listen()
process_thread = threading.Thread(target=process_thread_func)
process_thread.start()

input()

stop_listening()

As you can see, I use 2 recognizers, so one will always be listening and the other will process the audio data. The first one listens to data, then adds the audio data to a queue and listens again. At the same time, the other recognizer is checking if there is audio data to process into some text to then print it.

Ardyth answered 23/6, 2021 at 14:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.