Text-to-speech (voice generation) and speech-to-text (voice recognition) APIs?

Asked 14/6, 2011 at 19:13 Answered 8/1, 2014 at 17:37

Solved speech-recognition text-to-speech speech-to-text speech-synthesis

Is there a comprehensive list of known APIs for desktop or browser environments?

Caesura answered 14/6, 2011 at 19:13 Comment(2)

That is a really broad question. Are you interested in APIs or implementations? Are there any language or platform choices you could make to limit this? – Endure 14/6, 2011 at 22:40

I know that there are really few good solutions, so I decided that I choose the platform (in a broad sense) and programming language based on availability of good text-to-speech and speech-to-text for them. – Caesura 15/6, 2011 at 9:21

I'll rehash and update an answer from Speech recognition in C or Java or PHP?. This is by no means comprehensive, but it might be a start for you

From watching these questions for few months, I've seen most developer choices break down like this:

Windows folks - use the System.Speech features of .Net or Microsoft.Speech and install the free recognizers Microsoft provides. Windows 7 includes a full speech engine. Others are downloadable for free. There is a C++ API to the same engines known as SAPI. See at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. or http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx. More background on Microsoft engines for Windows What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?

Linux folks - Sphinx seems to have a good following. See http://cmusphinx.sourceforge.net/ and http://cmusphinx.sourceforge.net/wiki/

Commercial products - Nuance, Loquendo, AT&T, IBM, others. Each provide their own SDKs and libraries for various languages.

Online service - Nuance, Yapme, ispeech.org, vlingo, others. Nuance has improved their developer program and will now give you free access to their services for development. Yap (I believe) was recently purchased by Amazon, so we may see some changes there.

Of course this may also be helpful - http://en.wikipedia.org/wiki/List_of_speech_recognition_software

There is a Java speech API. See javax.speech.recognition in the Java Speech API http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html. I believe you still have to find a speech engine that supports this API. I don't think Sphinx fully supports it - http://cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4-faq.html#support_jsapi

There are lots of other SO quesitons: Need text to speech and speech recognition tools for Linux and pyspeech (python) - Transcribe mp3 files? which talks about http://code.google.com/p/pyspeech/. You may also want to look at http://code.google.com/p/dragonfly/

Endure answered 14/6, 2011 at 22:46 Comment(3)

Another unofficial online service that you missed is Google's Speech API. Here is a link to some API hooks in Java: github.com/The-Shadow/java-speech-api – Projector 1/2, 2014 at 19:19

I don't believe Google ever made their speech API publicly accessible. People have reverse engineered it and used it, but I don't believe Google supports it for 3rd party use. I believe it is only intended to be used by Chrome browser or Android operating system. See https://mcmap.net/q/530393/-google-speech-api-closed or https://mcmap.net/q/541178/-does-anyone-uses-google-speech-api-in-production – Endure 2/2, 2014 at 23:30

Google's API is accessible for free on Chrome. My web-app implementation of it: speechlogger.appspot.com – Enyo 7/5, 2015 at 21:59

The leading API vendors of text to speech (voice generation) are YAKiToMe! and iSpeech. YAKiToMe! is the one I use because I like their voice quality the best and they're the least expensive (mostly free). They support male and female speakers in multiple languages. Some of the voice vendors, like Acapella, Nuance, Loquendo and iVona have decent voices but tend to be expensive to use.

Warrantable answered 22/5, 2013 at 5:54 Comment(0)

Here is how you can do it: Note: it is an api from google, so it only works in chrome browser.

(See live demo and download full source code here http://purpledesign.in/blog/?p=33)

define a button

<input id="speech" type="text" speech="speech" x-webkit-speech="x-webkit-speech" onspeechchange="processspeech();" onwebkitspeechchange="processspeech();" />

and define what you want to do in a function in your javascript file

Like This

   function processspeech()
   {
     var speechtext=$("#speech").val();
     var elem = document.getElementById("test");
     elem.value = speechtext;
     var notification="\"<span style=\"color:#F00; text-transform:uppercase;\">"+  speechtext + "</span>\" <br />*Is this what you said???";
    notify(notification);
}

Here

<textarea> id="test"></textarea>

The speech is written in the textarea

Gaucherie answered 8/1, 2014 at 17:37 Comment(1)

The link directs to an empty hosted page. – Enyo 7/5, 2015 at 21:57

Recommended topics

Hot tags