How to implement Mozilla DeepSpeech into PHP web app to convert Speech-to-text?
Asked Answered
C

2

15

I have a PHP web application and am looking for an open source, high-accuracy speech-to-text recognition implementation that will take voice commands to open web pages from users. Examples: "Make Sales" (this will open Create Sales PHP page), "Make Purchase order", "Open END-OF-DAY reports", etc.

My Question :

I want to know if we can we use Mozilla DeepSpeech to take .wav audio from a Firefox browser and return speech to text. If yes, what will be the flow from recording voice from Firefox using mic TO convert text using the DeepSpeech engine?

How to make wakeup/launch call similar to OK-GOOGLE that will be ready to listen for commands?

Constipation answered 29/5, 2018 at 10:56 Comment(2)
me to have tried and could not find any proper API for that.Cardona
Seems like it should be possible, but I see positively no implementation guide. It looks like you're just trying to understand how things should be structure for this to work - definitely an interesting question. I was just looking at this last week for a home automation project I'm working on with CI, and was trying to wrap my head around it.Horseman
P
2

You can achieve that by creating a server and sending requests back and forth using assinchronious requests/AJAX or web sockets.

You can find Server installation instructions using the link below:

https://pypi.org/project/deepspeech-server/

After you have installed the server you can start making requests from any browser that supports "WebRTC API: getUserMedia()". Generate audio Blob data and send it in base64 format to the backend server. On the backend, save the blob to a temporary audio file:

$encodedData = base64_decode($data); 

// write the data out to the file
$fp = fopen($full_file_path, 'wb');
      fwrite($fp, $encodedData);
      fclose($fp);

Then convert audio file to text by making CURL request to your own Mozzila DeepSpeech Node.js server:

curl -X POST --data-binary @testfile.wav http://localhost:8080/stt

Create methods on your backend to loop through generated text and try to identify keywords/commands. If triggered send it back to the front end. Perhaps you just want to grant users ability to write long messages using their speech? - Return the whole text back - every time. You do however still want to "listen" to the keywords, in order to give users ability to set punctuation, start and finish writing.

Happy coding everyone ;)

Padova answered 24/3, 2021 at 9:11 Comment(2)
I tried installing Mozilla deepspeech server, after that i run deepspeech-server --config config.json but getting deepspeech-server is not recognized as internal or external command, operable program or batch file on windows 10, can you please help to resolve this issue, i have posted this question #69709993Laughter
@user3653474, Try Windows Task Manager, see more info in my answer to your question.Padova
S
-2

Please read: https://github.com/mdn/web-speech-api/tree/master/speech-color-changer

The translation from speech to text is done in the browser, on the client side. When the text is generated, it can be sent to the php server, using jquery.

Shoelace answered 3/7, 2020 at 14:35 Comment(2)
Please, explain your answer here and don't rely only on a link sharing.Ambuscade
Question is how to convert generated audio from the browser using PHP server and mozilla deepspeech API, so, then the result could be sent back to front end.Padova

© 2022 - 2024 — McMap. All rights reserved.