Recognizing multiple keywords using PocketSphinx
Asked Answered
L

3

21

I've installed the PocketSphinx demo and it works fine under Ubuntu and Eclipse, but despite trying I can't work out how I would add recognition of multiple words.

All I want is for the code to recognize single words, which I can then switch() within the code, e.g. "up", "down", "left", "right". I don't want to recognize sentences, just single words.

Any help on this would be grateful. I have spotted other users' having similar problems but nobody knows the answer so far.


One thing which is baffling me is why do we need to use the "wakeup" constant at all?

private static final String KWS_SEARCH = "wakeup";
private static final String KEYPHRASE = "oh mighty computer";
.
.
.
recognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);

What has wakeup got to do with anything?


I have made some progress (?) : Using addGrammarSearch I am able to use a .gram file to list my words, e.g. up,down,left,right,forwards,backwards, which seems to work well if all I say are those particular words. However, any other words will cause the system to match what is said to the "nearest" word from those stated. Ideally I don't want recognition to occur if words spoken are not in the .gram file...

Larvicide answered 9/9, 2014 at 15:11 Comment(3)
i read this question, but i can't find my answer. i do lots of searches too. i ask everyone who can help me, please see https://mcmap.net/q/341455/-define-a-new-keyword-in-pocket-sphinx/3671748Intend
i read this, but my problem is how can i define new KEYWORD -e.g. my phone- too. would toy please check my question? https://mcmap.net/q/341455/-define-a-new-keyword-in-pocket-sphinx/3671748Intend
can u help me please ? : #39506771Mcpeak
N
14

you can use addKeywordSearch which uses to file with keyphrases. One phrase per line with threshold for each phrase in //, for example

up /1.0/
down /1.0/
left /1.0/
right /1.0/
forwards /1e-1/

Threshold must be selected to avoid false alarms.

Newly answered 9/9, 2014 at 16:10 Comment(14)
Can you share the entire text inside your .gram file please? I feel that something else is missing. I am new to grammar files.Implode
There is nothing to update, this file is a file for keyword spotting as is, you should not add anything. And it is not grammar file, grammars are different. To learn about keyword spotting visit CMUSphinx page cmusphinx.sourceforge.net/wiki/tutoriallmNewly
Assuming I use such a file with pocketsphinx_continuous, I would provide the file path using -kws. Could I then use cmudict-en-us.dict and the included 16-bit PTM en-us ARPA model? Would the accuracy improve if I created a new dictionary for just those 5 words?Muumuu
en-us-ptm is an acoustic model, it is not arpa model. it is 16khz, not 16 bit. creating new dictionary would not improve the accuracy, though it might save you some memory (about 3mb).Newly
Yes indeed, 16khz acoustic. What is the significance of making the threshold for forwards different from the others? Why not denote it as /1e-1/ rather than /0.1/?Muumuu
The threshold depends on the word, for optimal detection you need to use word-specific thresholds. Since word "forwards" has two syllables, it most likely needs a different threshold. You can use 0.1 if you like.Newly
Are there any examples of such files in pocketsphinx? Do they have a file extension?Muumuu
Example is provided in the answer. You do not need extension, you can choose arbitrary one according to your preferences.Newly
why we need threshold,can anyone tell meHemostat
i am facing a problem it is listening words without saying anythingHemostat
Threshold controls false alarms, if you have too many detections simply change threshold.Newly
can we use local language words for speech recognition?Hemostat
@NikolayShmyrev . i read this, but my problem is how can i define new KEYWORD -e.g. my phone- too. would toy please check my question? https://mcmap.net/q/341455/-define-a-new-keyword-in-pocket-sphinx/3671748Intend
can u help me please : #39506771Mcpeak
L
21

Thanks to Nikolay's tip (see his answer above), I have developed the following code which works fine, and does not recognize words unless they're on the list. You can copy and paste this directly over the main class in the PocketSphinxDemo code:

public class PocketSphinxActivity extends Activity implements RecognitionListener
{
private static final String DIGITS_SEARCH = "digits";
private SpeechRecognizer recognizer;

@Override
public void onCreate(Bundle state)
{
    super.onCreate(state);

    setContentView(R.layout.main);

    ((TextView) findViewById(R.id.caption_text)).setText("Preparing the recognizer");

    try
    {
        Assets assets = new Assets(PocketSphinxActivity.this);
        File assetDir = assets.syncAssets();
        setupRecognizer(assetDir);
    }
    catch (IOException e)
    {
        // oops
    }

    ((TextView) findViewById(R.id.caption_text)).setText("Say up, down, left, right, forwards, backwards");

    reset();
}

@Override
public void onPartialResult(Hypothesis hypothesis)
{
}

@Override
public void onResult(Hypothesis hypothesis)
{
    ((TextView) findViewById(R.id.result_text)).setText("");

    if (hypothesis != null)
    {
        String text = hypothesis.getHypstr();
        makeText(getApplicationContext(), text, Toast.LENGTH_SHORT).show();
    }
}

@Override
public void onBeginningOfSpeech()
{
}

@Override
public void onEndOfSpeech()
{
    reset();
}

private void setupRecognizer(File assetsDir)
{
    File modelsDir = new File(assetsDir, "models");

    recognizer = defaultSetup().setAcousticModel(new File(modelsDir, "hmm/en-us-semi"))
                               .setDictionary(new File(modelsDir, "dict/cmu07a.dic"))
                               .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)
                               .getRecognizer();

    recognizer.addListener(this);

    File digitsGrammar = new File(modelsDir, "grammar/digits.gram");
    recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}

private void reset()
{
    recognizer.stop();
    recognizer.startListening(DIGITS_SEARCH);
}
}

Your digits.gram file should be something like:

up /1e-1/
down /1e-1/
left /1e-1/
right /1e-1/
forwards /1e-1/
backwards /1e-1/

You should experiment with the thresholds within the double slashes // for performance, where 1e-1 represents 0.1 (I think). I think the maximum is 1.0.

And it's 5.30pm so I can stop working now. Result.

Larvicide answered 9/9, 2014 at 16:6 Comment(17)
Thanks man!! these lines made the diference I did not see the addKeywordSearch (not add keywordS search, oin plural): File digitsGrammar = new File(modelsDir, "grammar/digits.gram"); recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar); } private void reset() { recognizer.stop(); recognizer.startListening(DIGITS_SEARCH); } }Implode
@pbs: Thanks for sharing your solution, it helped me a lot! I have one question though. Does your modified digits.gram contain anything else, or just the key words with the //? Because I get an exception, when trying to open and parse the digits.gram file.Li
You could try up /1/ down /1/ left /1/ right /1/, with carriage returns after the /1/'s.Larvicide
Now it runs, but I still have the problem, that if I say something totally different which is not in my grammar file it still tries to fit the closest match, therefore whatever I say I get a match, which is not too user friendly. This is how my digits.gram file looks like: #JSGF V1.0; grammar digits; public <command> = /1/ start | /1/ stop | /1/ frame;Li
I found my misstake...I wasn't using "addKeywordSearch", I was using addGrammarSearch...now I changed my grammer file to exactly what you have in your post above and it runs...but unfortunately I still get false positive results...so if I say something there will always be match even if I say something totally different.Li
As @Li stated, same happens with me as well, Hypothesis returns values from .gram file without even speaking something.Retrusion
@Implode Do you mind helping me? I am trying to simply recognize the word "hello". Thanks! #35389220Pensive
@Retrusion Do you mind helping me? I am trying to simply recognize the word "hello". Thanks! #35389220Pensive
i am facing a problem it is listening words without saying anythingHemostat
@chitrang in my case hypothesis returns values from .gram file without even speaking something or speaking something else .how to get rid of this issue?Hemostat
can we use local language words for speech recognition?Hemostat
the higher the threshold, the more accurate you must speak? Or it is viceversa? @poirotTacho
@Tacho I'm not sure to be honest, and I haven't done anything with this code for over a year so don't recall the details. Maybe there are some docs on this somewhere... sorry I can't be of more help.Larvicide
you had created your own dictionary or you added your words in existing dictionary?Hemostat
do i need to build acoustic model, lm files and dictionary to search words.Hemostat
I didn't have to build an acoustic model. I just used the files in my answer above. That's it. Any other files required came with the package. I just set the whole thing up by downloading and adding to eclipse project. The only "technical stuff" I did is mentioned in the answer.Larvicide
Try /1e-1/ in the gram file. I vaguely recall other values did not work for me. It was a long time ago.Larvicide
N
14

you can use addKeywordSearch which uses to file with keyphrases. One phrase per line with threshold for each phrase in //, for example

up /1.0/
down /1.0/
left /1.0/
right /1.0/
forwards /1e-1/

Threshold must be selected to avoid false alarms.

Newly answered 9/9, 2014 at 16:10 Comment(14)
Can you share the entire text inside your .gram file please? I feel that something else is missing. I am new to grammar files.Implode
There is nothing to update, this file is a file for keyword spotting as is, you should not add anything. And it is not grammar file, grammars are different. To learn about keyword spotting visit CMUSphinx page cmusphinx.sourceforge.net/wiki/tutoriallmNewly
Assuming I use such a file with pocketsphinx_continuous, I would provide the file path using -kws. Could I then use cmudict-en-us.dict and the included 16-bit PTM en-us ARPA model? Would the accuracy improve if I created a new dictionary for just those 5 words?Muumuu
en-us-ptm is an acoustic model, it is not arpa model. it is 16khz, not 16 bit. creating new dictionary would not improve the accuracy, though it might save you some memory (about 3mb).Newly
Yes indeed, 16khz acoustic. What is the significance of making the threshold for forwards different from the others? Why not denote it as /1e-1/ rather than /0.1/?Muumuu
The threshold depends on the word, for optimal detection you need to use word-specific thresholds. Since word "forwards" has two syllables, it most likely needs a different threshold. You can use 0.1 if you like.Newly
Are there any examples of such files in pocketsphinx? Do they have a file extension?Muumuu
Example is provided in the answer. You do not need extension, you can choose arbitrary one according to your preferences.Newly
why we need threshold,can anyone tell meHemostat
i am facing a problem it is listening words without saying anythingHemostat
Threshold controls false alarms, if you have too many detections simply change threshold.Newly
can we use local language words for speech recognition?Hemostat
@NikolayShmyrev . i read this, but my problem is how can i define new KEYWORD -e.g. my phone- too. would toy please check my question? https://mcmap.net/q/341455/-define-a-new-keyword-in-pocket-sphinx/3671748Intend
can u help me please : #39506771Mcpeak
A
0

Working on updating Antinous amendment to the PocketSphinx demo to allow it to run on Android Studio. This is what I have so far,

//Note: change MainActivity to PocketSphinxActivity for demo use...
public class MainActivity extends Activity implements RecognitionListener {
private static final String DIGITS_SEARCH = "digits";
private SpeechRecognizer recognizer;

/* Used to handle permission request */
private static final int PERMISSIONS_REQUEST_RECORD_AUDIO = 1;

@Override
public void onCreate(Bundle state) {
    super.onCreate(state);

    setContentView(R.layout.main);
    ((TextView) findViewById(R.id.caption_text))
            .setText("Preparing the recognizer");

    // Check if user has given permission to record audio
    int permissionCheck = ContextCompat.checkSelfPermission(getApplicationContext(), Manifest.permission.RECORD_AUDIO);
    if (permissionCheck != PackageManager.PERMISSION_GRANTED) {
        ActivityCompat.requestPermissions(this, new String[]{Manifest.permission.RECORD_AUDIO}, PERMISSIONS_REQUEST_RECORD_AUDIO);
        return;
    }

    new AsyncTask<Void, Void, Exception>() {
        @Override
        protected Exception doInBackground(Void... params) {
            try {
                Assets assets = new Assets(MainActivity.this);
                File assetDir = assets.syncAssets();
                setupRecognizer(assetDir);
            } catch (IOException e) {
                return e;
            }
            return null;
        }
        @Override
        protected void onPostExecute(Exception result) {
            if (result != null) {
                ((TextView) findViewById(R.id.caption_text))
                        .setText("Failed to init recognizer " + result);
            } else {
                reset();
            }
        }
    }.execute();
    ((TextView) findViewById(R.id.caption_text)).setText("Say one, two, three, four, five, six...");
}

/**
 * In partial result we get quick updates about current hypothesis. In
 * keyword spotting mode we can react here, in other modes we need to wait
 * for final result in onResult.
 */

@Override
public void onPartialResult(Hypothesis hypothesis) {
    if (hypothesis == null) {
        return;
    } else if (hypothesis != null) {
        if (recognizer != null) {
            //recognizer.rapidSphinxPartialResult(hypothesis.getHypstr());
            String text = hypothesis.getHypstr();
            if (text.equals(DIGITS_SEARCH)) {
                recognizer.cancel();
                performAction();
                recognizer.startListening(DIGITS_SEARCH);
            }else{
                //Toast.makeText(getApplicationContext(),"Partial result = " +text,Toast.LENGTH_SHORT).show();
            }
        }
    }
}
@Override
public void onResult(Hypothesis hypothesis) {
    ((TextView) findViewById(R.id.result_text)).setText("");
    if (hypothesis != null) {
        String text = hypothesis.getHypstr();
        makeText(getApplicationContext(), "Hypothesis" +text, Toast.LENGTH_SHORT).show();
    }else if(hypothesis == null){
        makeText(getApplicationContext(), "hypothesis = null", Toast.LENGTH_SHORT).show();
    }
}
@Override
public void onDestroy() {
    super.onDestroy();
    recognizer.cancel();
    recognizer.shutdown();
}
@Override
public void onBeginningOfSpeech() {
}
@Override
public void onEndOfSpeech() {
   reset();
}
@Override
public void onTimeout() {
}
private void setupRecognizer(File assetsDir) throws IOException {
    // The recognizer can be configured to perform multiple searches
    // of different kind and switch between them
    recognizer = defaultSetup()
            .setAcousticModel(new File(assetsDir, "en-us-ptm"))
            .setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
            // .setRawLogDir(assetsDir).setKeywordThreshold(1e-20f)
            .getRecognizer();
    recognizer.addListener(this);

    File digitsGrammar = new File(assetsDir, "digits.gram");
    recognizer.addKeywordSearch(DIGITS_SEARCH, digitsGrammar);
}
private void reset(){
    recognizer.stop();
    recognizer.startListening(DIGITS_SEARCH);
}
@Override
public void onError(Exception error) {
    ((TextView) findViewById(R.id.caption_text)).setText(error.getMessage());
}

public void performAction() {
    // do here whatever you want
    makeText(getApplicationContext(), "performAction done... ", Toast.LENGTH_SHORT).show();
}
}

Caveat emptor: this is a work in progress. Check back later. Suggestions would be appreciated.

Alfrediaalfredo answered 24/4, 2019 at 5:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.