Add a new language to OpenEars
Asked Answered
D

1

14

I've recently started studying OpenEars speech recognition and it's great! But I also need to support speech recognition and dictation in other languages such as Russian, French and German.I've found that here are available various acoustic and language models.

But I cannot really understand - is that enough what I need to integrate extra language support in application?

Question is - what steps should I take in order to successfully integrate, for example russian, in Open Ears?

As far as I understood - all acoustic and language models for english language in Open Ears demo is located in folder hub4wsj_sc_8k . Same files can be found in voxforge language archives. So I just replaced them in demo. One thing is different - in demo English language, there also was a sendump 2MB large file, which is not located in voxforge language archives.There are two other files used in Open Ears demo:

  • OpenEars1.languagemodel
  • OpenEars1.dic

These I replaced with:

  • msu_ru_nsh.lm.dmp
  • msu_ru_nsh.dic

as .dmp is similar to .languagemodel. But application is crashing without any error.

What am I doing wrong? Thank You.

Disposable answered 10/1, 2013 at 9:12 Comment(4)
Hi Guntis, OpenEars developer here. Glad you're finding the framework great! Step 1 for issues like this is to turn on OpenEarsLogging and verbosePocketsphinx, which will give you very fine-grained info on what is going wrong (search your console output for the words error and warning to save time). Instructions on doing this can be found in the docs. Feel free to bring questions to the OpenEars forums since in-depth troubleshooting isn't a great fit for SO: politepix.com/forums/openears You might also want to check out this thread: politepix.com/forums/topic/other-languagesDeettadeeyn
To follow up for later readers, after turning on logging we got this working by using the mixture_weights file as a substitute for sendump and by making sure that the phonetic dictionary used the phonemes that were present in the acoustic model rather than the English-language phonemes.Deettadeeyn
@Deettadeeyn Could you post this as an answer so the question will not remain open?Zurkow
OK, I've added it as an answer with a link to the offsite troubleshooting process.Deettadeeyn
D
10

From my comments, reposted as an answer:

[....] Step 1 for issues like this is to turn on OpenEarsLogging and verbosePocketsphinx, which will give you very fine-grained info on what is going wrong (search your console output for the words error and warning to save time). Instructions on doing this can be found in the docs. Feel free to bring questions to the OpenEars forums [....]: http://politepix.com/forums/openears You might also want to check out this thread: http://politepix.com/forums/topic/other-languages

The solution:

To follow up for later readers, after turning on logging we got this working by using the mixture_weights file as a substitute for sendump and by making sure that the phonetic dictionary used the phonemes that were present in the acoustic model rather than the English-language phonemes.

The full discussion in which we accomplished this troubleshooting can be read here: http://www.politepix.com/forums/topic/using-russian-acoustic-model/


UPDATE: Since OpenEars 1.5 was released this week, it is possible to pass the path to any acoustic model as an argument to the main listening method, and there is a much more standardized method for packaging and referencing any acoustic model so you can have many acoustic models in the same app. The info in this forum post supersedes the info in the discussion I linked to in this answer: http://www.politepix.com/forums/topic/creating-an-acoustic-model-bundle-for-openears-1-5-and-up/ I left the rest of the answer for historical reasons and because there may be details in that discussion that are still useful, but it can be skipped in favor of the new link.

Deettadeeyn answered 10/5, 2013 at 15:51 Comment(1)
You're welcome, and thanks for the quality input during the troubleshooting process that made it easy to help.Deettadeeyn

© 2022 - 2024 — McMap. All rights reserved.