Is there any way to convert user speech to text in realtime using R ? Just curious. Also it will be great if anybody could share some examples regarding what they have done in this domain.
I'm just working on googleLanguageR
that includes speech-to-text via the Google Cloud Speech API
As of 2023, it is possible to get speech-to-text transcription (and translation) using the "Whisper" Automatic Speech Recognition model.
The R package audio.whisper
wraps the whisper.cpp C++ library, and basically makes it possible to transcribe text from within R. Once the model has been downloaded, the whole process can be conducted offline, without the need to call any external API.
The quality of the transcription is surprisingly good, including for major languages other than English. This is however not meant for "real time" transcriptions, as mentioned in the question, even if it probably can be adapted to work this way using one of the smaller models.
At the time of writing, one issue in particular should be mentioned for anyone who intends to try out audio.whisper
:
- as mentioned in the Readme, you should really consider installing (or reinstalling) the package using some of the suggested flags, as this dramatically improves performance
Searching on GitHub for "whisper language:R" shows other R packages that rely on Whisper, but they mostly expect you to install whisper
separately.
More complete, refined, or better documented R packages may appear, but these suggestions should put you on the right track to find a meaningful solution.
© 2022 - 2024 — McMap. All rights reserved.