What are the ways to implement speech recognition in Electron?
Asked Answered
F

2

10

So I have an Electron app that uses the web speech API (SpeechRecognition) to take the user's voice, however, it's not working. The code:

if ("webkitSpeechRecognition" in window) {
  let SpeechRecognition =
    window.SpeechRecognition || window.webkitSpeechRecognition;
  let recognition = new SpeechRecognition();

  recognition.onstart = () => {
    console.log("We are listening. Try speaking into the microphone.");
  };

  recognition.onspeechend = () => {
    recognition.stop();
  };

  recognition.onresult = (event) => {
    let transcript = event.results[0][0].transcript;
    console.log(transcript);
  };

  recognition.start();
} else {
  alert("Browser not supported.");
}

It says We are listening... in the console, but no matter what you say, it doesn't give an output. On the other hand, running the exact same thing in Google Chrome works and whatever I say gets console logged out with the console.log(transcript); part. I did some more research and it turns out that Google has recently stopped support for the Web Speech API in shell-based Chromium windows (Tmk, everything that is not Google Chrome or MS Edge), so that seems to be the reason it is not working on my Electron app.

See: electron-speech library's end Artyom.js issue another stackOverflow question regarding this

So is there any way I can get it to work in Electron?

Flail answered 18/1, 2023 at 18:8 Comment(1)
Hey, and if possible, maybe this question could gain enough traction to reach the companies managing these APIs and perhaps they could do something about native support on shell-based browsers. I understand the reasons they might've disabled it, but I think those should be solved in a way other than completely removing support.Flail
F
6

I ended up doing an implementation that uses the media devices API to get the user's speech through their microphone and then sends it to a Python server using WebSockets which uses the audio stream with the SpeechRecognition pip package and returns the transcribed text to the client (Electron app).

This is what I implemented, it is way too long for a thing as simple as this, but if someone has a better suggestion, please do let me know by writing an answer.

Flail answered 19/1, 2023 at 15:44 Comment(2)
When/If I get enough reputation, I'll put a bounty on this question, because this problem for Electron devs needs to be highlighted.Flail
Nobody bothered to answer something better despite the bounty I placed of 100 rep. Hope this question gets some traction at some point in the future.Flail
H
0

I used Rust, Neon, cpal and Vosk to make a nodejs module that can start/stop independent OS threads that handle listening to the mic and recognizing text from it in real-time. From node you can select the device and plug in different language recognizers, hand it trigger words to call back to, etc. It works for what I built it for but I can probably put up a repo for it and make it a little more flexible if anyone's interested.

const { app, BrowserWindow } = require('electron');
const voiceModule = require('./index.node');


// in this demo I will stop after two rounds of recognizing target words:
let called = 0;
function onWordsFound(words) {
  console.log('words found:', words);
  called ++;
  if (called > 1) {
    console.log('stopping listener');
    voiceModule.stopListener();
    return;
  }
  // I use setTimeout here since the rust function calling this js function must exit before the next call to lookForWords
  // but you can use voiceModule.lookForWords anywhere in your JS code
  setTimeout(() => {
    console.log('calling lookForWords');
    voiceModule.lookForWords(["second", "words"], true); 
  }, 1000);
}

const f = async () => {
  voiceModule.setPathToModel('./models/large'); // this is the english large model, but you can use any vosk-compatible model you want
  const r = voiceModule.listDevices();
  // just use the default microphone for now but you can use listDevices and setMicName to make a selection UI
  voiceModule.setMicName(r[0]); 
  // after selecting the mic you can call startListener
  voiceModule.startListener(onWordsFound); // pass your callback
  voiceModule.lookForWords(['hello', 'world'], false); // false means match ANY word, true means they must match ALL words in the list
};
Hin answered 4/4 at 4:57 Comment(2)
Please do set up that repo. I would like to see what you came up with. I first need permission from my client to share the solution, but if they allow, I will make a repo as well.Flail
Here you go: github.com/orthagonal/electron-voiceHin

© 2022 - 2024 — McMap. All rights reserved.