How to create custom text-to-speech engine

Asked 28/8, 2011 at 20:20 Answered 16/7, 2022 at 8:37

As I know, TTS needs TTS engine to speak one language. In Android emulator 2.2, Pico TTS engine is default. It has only some popular languages. I can see some engines on Market which must be purchased to install. My question: is there any way to create a custom engine which support other languages?(by programming or using software)

(I don't know if I should post this question in StackOverflow or SuperUser. If wrong place, please migrate it)

Tautomer answered 28/8, 2011 at 20:20 Comment(2)

Please specify for which language you want to enable TTS functionality. Is your requirement for limited vocabulary (e.g TTS functionality for just digits 0 to 9) or for arbitrary text input ? – Curmudgeon 1/11, 2011 at 13:7

Any language if possible, I mean I want to create a new TTS engine by coding. – Tautomer 3/11, 2011 at 8:18

I am also interested in making my tts engine. Here are some information I've found. On this link you can find a brief description what you have to do to make your tts engine for android. Since API level 14 there is abstract class for tts engine implementation. More on link.

But making conversion from text to speech isn't so easy. Some basic information what tts engine should implement can be found on wikipedia.

Unsure answered 19/9, 2012 at 13:23 Comment(0)

As far as my research goes the best architecture for making a TTS engine currently is Tacotron 2[Paper here], a neural network architecture for speech synthesis directly from text (can easily capture via OCR). It has achieved a MOS(mean opinion score) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech. The official implementation of Tacotron 2 is not public but there is a tensorflow implementation made using tensorflow 1.15.0 here. There is also a pytorch implementation by nvidia here which is more currently maintained. Both implementations can be retrained using dataset for a new language(language with no TTS implementation yet) for easy implementation of a TTS engine. You can also use the architectures above as a stepping stone to build your own architecture.

Nidanidaros answered 19/4, 2021 at 17:46 Comment(0)

Use mic recording software to record every sound in IPA or the Internation Phonetic Alphabet. Then create a JSON file that has a pronunciation value for every word key. Finally, tell your program to speak each of the sounds in the IPA pronunciation to form an entire word. Depending on whether there is a question mark or a period, adjust the tone. If the sentence is happy sounding, increase the pitch. If the sentence is sad sounding decrease pitch. Analyze the sentiments of the sentences to determine the pitch.

Ondrea answered 16/7, 2022 at 8:37 Comment(0)

Recommended topics

Hot tags