Where can I find documentation on ARPA language model format?
I am developing simple speech recognition app with pocket-sphinx STT engine. ARPA is recommended there for performance reasons. I want to understand how much can I do to adjust my language model for my custom needs.
All I found is some very brief ARPA format descriptions:
- http://kered.org/blog/2008-08-12/arpa-language-model-file-format/
- http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html
- http://www.speech.cs.cmu.edu/SLM/toolkit_documentation.html
I am beginner to STT and I have trouble to wrap head around this (n-grams, etc...). I am looking for more detailed docs. Something like documentation on JSGF grammar here: