Building openears compatible language model
Asked Answered
F

2

17

I am doing some development on speech to text and text to speech and I found the OpenEars API very useful.

The principle of this cmu-slm based API is it uses a language model to map the speech listened by the iPhone device. So I decided to find a big English language model to feed the API speech recognizer engine. But I failed to understand the format of the voxfourge english data model to use with OpenEars.

Do anyone have any idea that how can I get the .languagemodel and .dic file for English language to work with OpenEars?

Flock answered 7/3, 2011 at 14:8 Comment(7)
Do anyone have any idea about the CMU toolkit that generates the languagemodel step by step with commands like text2wreq and etc.Flock
Aww dude, I am in a project which I need to integrate that into video editing but sadly in C#. Mind keeping me up to date on what happens ? Iphone is pretty fun especially what you are doing.Tonsillectomy
ya dude . actuallu I hae tried to build the language model.but unable to do so after lots of r n d.Flock
just asking, i read something about uploading a textfile with all the words that are to be recognized into the CMU site in context of language models.Whats that all about?Sanitarium
@Mithun Madhav that limits only 4000 words. for more you have to use the tool that I describedFlock
Which voxforge data are using? a pre-built language model? or some text you want to use to build a language model?Backbite
I think pre-built English language model. as I know how to make custom model made of upto 4000 words . speech.cs.cmu.edu/tools/lmtool.htmlFlock
C
2

Old question, but maybe the answer is still interesting. OpenEars now has built-in language model generation, so one option is for you to create models dynamically in your app as you need them using the LanguageModelGenerator class, which uses the MITLM library and NSScanner to accomplish the same task as the CMU toolkit mentioned above. Processing a corpus with >5000 words on the iPhone is going to take a very long time, but you could always use the Simulator to run it once and get the output out of the documents folder and keep it.

Another option for large vocabulary recognition is explained here:

Creating ARPA language model file with 50,000 words

Having said that, I need to point out as the OpenEars developer that the CMU tool's limit of 5000 words corresponds pretty closely to the maximum vocabulary size that is likely to have decent accuracy and processing speed on the iPhone when using Pocketsphinx. So, the last suggestion would be to either reconceptualize your task so that it doesn't absolutely require large vocabulary recognition (for instance, since OpenEars allows you switch models on the fly, you may find that you don't need one enormous model but can get by with multiple smaller ones that you can switch in in different contexts), or to use a network-based API that can do large vocabulary recognition on a server (or make your own API that uses Sphinx4 on your own server). Good luck!

Chastise answered 18/7, 2011 at 19:28 Comment(1)
great news and answer . thanks .may be I can try and see how the new changes works in the openears.Flock
R
5

Regarding LM Formats:

AFAIK most Language Models use the ARPA standard for Language Models. Sphinx / CMU language models are compiled into binary format. You'd need the source format to convert a Sphinx LM into another format. Most other Language Models are in text format.

I'd recommend using the HTK Speech Recognition Toolkit ; Detailed Documentation here: http://htk.eng.cam.ac.uk/ftp/software/htkbook_html.tar.gz

Here's also a description of CMU's SLM Toolkit: http://www.speech.cs.cmu.edu/SLM/toolkit_documentation.html

Here's an example of a language model in ARPA format I found on the net: http://www.arborius.net/~jphekman/sphinx/full/index.html

You probably want to create an ARPA LM first, then convert it into any binary format if needed.

In General:

To build a language model, you need lots and lots of training data - to determine what the probability of any other word in your vocabulary is, after observing the current input to this point in time.

You can't just "make" a language model by just adding the words you want to recognize - you also need a lot of training data (= typical input you observe when running your speech recognition application).

A Language Model is not just a word list -- it estimates the probability of the next token (word) in the input. To estimate those probabilities, you need to run a training process, which goes over training data (e.g. historic data), and observes word frequencies there to estimate above mentioned probabilities.

For your problem, maybe as a quick solution, just assume all words have the same frequency / probability.

  1. create a dictionary with the words you want to recognize (N words in dictionary)

  2. create a language model which has 1/N as the probability for each word (uni-gram language model)

you can then interpolate that uni-gram language model (LM) with another LM for a bigger corpus using HTK Toolkit

Retene answered 19/4, 2011 at 17:41 Comment(1)
first of all thanks for answering after a long time I asked the question . the answer looks like a puzzle to me . I know that the subject needs a lot of r&d but though if you have some nearby hint that draws me to opeanears compatible lm.Flock
C
2

Old question, but maybe the answer is still interesting. OpenEars now has built-in language model generation, so one option is for you to create models dynamically in your app as you need them using the LanguageModelGenerator class, which uses the MITLM library and NSScanner to accomplish the same task as the CMU toolkit mentioned above. Processing a corpus with >5000 words on the iPhone is going to take a very long time, but you could always use the Simulator to run it once and get the output out of the documents folder and keep it.

Another option for large vocabulary recognition is explained here:

Creating ARPA language model file with 50,000 words

Having said that, I need to point out as the OpenEars developer that the CMU tool's limit of 5000 words corresponds pretty closely to the maximum vocabulary size that is likely to have decent accuracy and processing speed on the iPhone when using Pocketsphinx. So, the last suggestion would be to either reconceptualize your task so that it doesn't absolutely require large vocabulary recognition (for instance, since OpenEars allows you switch models on the fly, you may find that you don't need one enormous model but can get by with multiple smaller ones that you can switch in in different contexts), or to use a network-based API that can do large vocabulary recognition on a server (or make your own API that uses Sphinx4 on your own server). Good luck!

Chastise answered 18/7, 2011 at 19:28 Comment(1)
great news and answer . thanks .may be I can try and see how the new changes works in the openears.Flock

© 2022 - 2024 — McMap. All rights reserved.