Recognizing multiple keywords using PocketSphinx

Asked 9/9, 2014 at 15:11 Answered 24/4, 2019 at 5:29

Solved android speech-recognition cmusphinx

I've installed the PocketSphinx demo and it works fine under Ubuntu and Eclipse, but despite trying I can't work out how I would add recognition of multiple words.

All I want is for the code to recognize single words, which I can then switch() within the code, e.g. "up", "down", "left", "right". I don't want to recognize sentences, just single words.

Any help on this would be grateful. I have spotted other users' having similar problems but nobody knows the answer so far.

One thing which is baffling me is why do we need to use the "wakeup" constant at all?

private static final String KWS_SEARCH = "wakeup";
private static final String KEYPHRASE = "oh mighty computer";
.
.
.
recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);

What has wakeup got to do with anything?

I have made some progress (?) : Using addGrammarSearch I am able to use a .gram file to list my words, e.g. up,down,left,right,forwards,backwards, which seems to work well if all I say are those particular words. However, any other words will cause the system to match what is said to the "nearest" word from those stated. Ideally I don't want recognition to occur if words spoken are not in the .gram file...

Larvicide answered 9/9, 2014 at 15:11 Comment(3)

i read this question, but i can't find my answer. i do lots of searches too. i ask everyone who can help me, please see https://mcmap.net/q/341455/-define-a-new-keyword-in-pocket-sphinx/3671748 – Intend 4/6, 2016 at 11:29

i read this, but my problem is how can i define new KEYWORD -e.g. my phone- too. would toy please check my question? https://mcmap.net/q/341455/-define-a-new-keyword-in-pocket-sphinx/3671748 – Intend 4/6, 2016 at 13:37

can u help me please ? : #39506771 – Mcpeak 15/9, 2016 at 8:38

you can use addKeywordSearch which uses to file with keyphrases. One phrase per line with threshold for each phrase in //, for example

up /1.0/
down /1.0/
left /1.0/
right /1.0/
forwards /1e-1/

Threshold must be selected to avoid false alarms.

Newly answered 9/9, 2014 at 16:10 Comment(14)

Can you share the entire text inside your .gram file please? I feel that something else is missing. I am new to grammar files. – Implode 20/3, 2015 at 16:30

There is nothing to update, this file is a file for keyword spotting as is, you should not add anything. And it is not grammar file, grammars are different. To learn about keyword spotting visit CMUSphinx page cmusphinx.sourceforge.net/wiki/tutoriallm – Newly 19/11, 2015 at 14:15

Assuming I use such a file with pocketsphinx_continuous, I would provide the file path using -kws. Could I then use cmudict-en-us.dict and the included 16-bit PTM en-us ARPA model? Would the accuracy improve if I created a new dictionary for just those 5 words? – Muumuu 11/1, 2016 at 0:53

en-us-ptm is an acoustic model, it is not arpa model. it is 16khz, not 16 bit. creating new dictionary would not improve the accuracy, though it might save you some memory (about 3mb). – Newly 11/1, 2016 at 0:58

Yes indeed, 16khz acoustic. What is the significance of making the threshold for forwards different from the others? Why not denote it as /1e-1/ rather than /0.1/? – Muumuu 11/1, 2016 at 8:56

The threshold depends on the word, for optimal detection you need to use word-specific thresholds. Since word "forwards" has two syllables, it most likely needs a different threshold. You can use 0.1 if you like. – Newly 11/1, 2016 at 8:57

Are there any examples of such files in pocketsphinx? Do they have a file extension? – Muumuu 11/1, 2016 at 14:18

Example is provided in the answer. You do not need extension, you can choose arbitrary one according to your preferences. – Newly 11/1, 2016 at 15:1

why we need threshold,can anyone tell me – Hemostat 22/3, 2016 at 6:44

i am facing a problem it is listening words without saying anything – Hemostat 22/3, 2016 at 7:19

Threshold controls false alarms, if you have too many detections simply change threshold. – Newly 22/3, 2016 at 10:5

can we use local language words for speech recognition? – Hemostat 30/3, 2016 at 7:30

@NikolayShmyrev . i read this, but my problem is how can i define new KEYWORD -e.g. my phone- too. would toy please check my question? https://mcmap.net/q/341455/-define-a-new-keyword-in-pocket-sphinx/3671748 – Intend 4/6, 2016 at 13:38

can u help me please : #39506771 – Mcpeak 15/9, 2016 at 8:38

Thanks to Nikolay's tip (see his answer above), I have developed the following code which works fine, and does not recognize words unless they're on the list. You can copy and paste this directly over the main class in the PocketSphinxDemo code:

public class PocketSphinxActivity extends Activity implements RecognitionListener
{
private static final String DIGITS_SEARCH = "digits";
private SpeechRecognizer recognizer;

@Override
public void onCreate(Bundle state)
{
    super.onCreate(state);

    setContentView(R.layout.main);

    ((TextView) findViewById(R.id.caption_text)).setText("Preparing the recognizer");

    try
    {
        Assets assets = new Assets(PocketSphinxActivity.this);
        File assetDir = assets.syncAssets();
        setupRecognizer(assetDir);
    }
    catch (IOException e)
    {
        // oops
    }

    ((TextView) findViewById(R.id.caption_text)).setText("Say up, down, left, right, forwards, backwards");

    reset();
}

@Override
public void onPartialResult(Hypothesis hypothesis)
{
}

@Override
public void onResult(Hypothesis hypothesis)
{
    ((TextView) findViewById(R.id.result_text)).setText("");

    if (hypothesis != null)
    {
        String text = hypothesis.getHypstr();
        makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show();
    }
}

@Override
public void onBeginningOfSpeech()
{
}

@Override
public void onEndOfSpeech()
{
    reset();
}

private void setupRecognizer(File assetsDir)
{
    File modelsDir = new File(assetsDir, "models");

    recognizer = defaultSetup().setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))
                               .setDictionary(new File(modelsDir, "dict/cmu07a.dic"))
                               .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)
                               .getRecognizer();

    recognizer.addListener(this);

    File digitsGrammar = new File(modelsDir, "grammar/digits.gram");
    recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}

private void reset()
{
    recognizer.stop();
    recognizer.startListening(DIGITS_SEARCH);
}
}

Your digits.gram file should be something like:

up /1e-1/
down /1e-1/
left /1e-1/
right /1e-1/
forwards /1e-1/
backwards /1e-1/

You should experiment with the thresholds within the double slashes // for performance, where 1e-1 represents 0.1 (I think). I think the maximum is 1.0.

And it's 5.30pm so I can stop working now. Result.

Larvicide answered 9/9, 2014 at 16:6 Comment(17)

Thanks man!! these lines made the diference I did not see the addKeywordSearch (not add keywordS search, oin plural): File digitsGrammar = new File(modelsDir, "grammar/digits.gram"); recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar); } private void reset() { recognizer.stop(); recognizer.startListening(DIGITS_SEARCH); } } – Implode 20/3, 2015 at 17:5

@pbs: Thanks for sharing your solution, it helped me a lot! I have one question though. Does your modified digits.gram contain anything else, or just the key words with the //? Because I get an exception, when trying to open and parse the digits.gram file. – Li 9/6, 2015 at 2:3

You could try up /1/ down /1/ left /1/ right /1/, with carriage returns after the /1/'s. – Larvicide 9/6, 2015 at 6:48

Now it runs, but I still have the problem, that if I say something totally different which is not in my grammar file it still tries to fit the closest match, therefore whatever I say I get a match, which is not too user friendly. This is how my digits.gram file looks like: #JSGF V1.0; grammar digits; public <command> = /1/ start | /1/ stop | /1/ frame; – Li 9/6, 2015 at 22:17

I found my misstake...I wasn't using "addKeywordSearch", I was using addGrammarSearch...now I changed my grammer file to exactly what you have in your post above and it runs...but unfortunately I still get false positive results...so if I say something there will always be match even if I say something totally different. – Li 9/6, 2015 at 22:42

As @Li stated, same happens with me as well, Hypothesis returns values from .gram file without even speaking something. – Retrusion 19/11, 2015 at 9:59

@Implode Do you mind helping me? I am trying to simply recognize the word "hello". Thanks! #35389220 – Pensive 14/2, 2016 at 23:42

@Retrusion Do you mind helping me? I am trying to simply recognize the word "hello". Thanks! #35389220 – Pensive 14/2, 2016 at 23:43

i am facing a problem it is listening words without saying anything – Hemostat 22/3, 2016 at 6:50

@chitrang in my case hypothesis returns values from .gram file without even speaking something or speaking something else .how to get rid of this issue? – Hemostat 29/3, 2016 at 9:23

can we use local language words for speech recognition? – Hemostat 30/3, 2016 at 7:40

the higher the threshold, the more accurate you must speak? Or it is viceversa? @poirot – Tacho 3/4, 2016 at 19:47

@Tacho I'm not sure to be honest, and I haven't done anything with this code for over a year so don't recall the details. Maybe there are some docs on this somewhere... sorry I can't be of more help. – Larvicide 3/4, 2016 at 20:7

you had created your own dictionary or you added your words in existing dictionary? – Hemostat 15/4, 2016 at 5:40

do i need to build acoustic model, lm files and dictionary to search words. – Hemostat 15/4, 2016 at 10:11

I didn't have to build an acoustic model. I just used the files in my answer above. That's it. Any other files required came with the package. I just set the whole thing up by downloading and adding to eclipse project. The only "technical stuff" I did is mentioned in the answer. – Larvicide 15/4, 2016 at 13:50

Try /1e-1/ in the gram file. I vaguely recall other values did not work for me. It was a long time ago. – Larvicide 16/4, 2016 at 9:18

you can use addKeywordSearch which uses to file with keyphrases. One phrase per line with threshold for each phrase in //, for example

up /1.0/
down /1.0/
left /1.0/
right /1.0/
forwards /1e-1/