How to see if word exists in Pocketsphinx dictionary?
Asked Answered
G

3

2

I simply want to see if a string exists in a dictionary file. (Dictionary file at bottom of question)

I want to check if the voice recognizer can recognize a word or not. For example, the recognizer will not be able to recognize a string of ahdfojakdlfafiop, because that is not defined in the dictionary. So, can I check if a word is in the dictionary of pocktsphinx?

Something like:

    if(myString.existsInDictionary){
startListeningBecauseExists();
    }else(
//Doesn't exist in dictionary!!!
       }

I just want a way to be able to tell if the recognizer can listen for what I want it to listen to.

here is the dictionary file:

https://raw.githubusercontent.com/cmusphinx/pocketsphinx-android-demo/master/app/src/main/assets/sync/cmudict-en-us.dict

Thanks,

Ruchir

Grave answered 2/3, 2016 at 2:54 Comment(6)
Have a look at #20418819Microchemistry
read all the words from dictionary file into ArrayList and always do check if(list.contains(myString)).Lockwood
@BalwinderSingh I know how to read a file, but if you look at the link in my question, it is not that straight forward. Each line has a pronunciation next to it, that I don't care about. All I care about is the word on each line. How can I just read the words?Grave
@Lockwood I know how to read a file, but if you look at the link in my question, it is not that straight forward. Each line has a pronunciation next to it, that I don't care about. All I care about is the word on each line. How can I just read the words?Grave
I've just repost my answer, including reading a dictionary file sampled from your dictionary file. You can check it out down below, hope that it helps.Drinkwater
@RuchirBaronia depending on dictionary size, I would create a data-structure of hash-table or treemap, go get better look up search results.Barley
F
3

In C there is ps_lookup_word function which allows you to lookup for the word:

if (ps_lookup_word(ps, "abc") == NULL) {
    // do something
}

In Java wrapper it's a method Decoder.lookupWord:

if(decoder.lookupWord("abc") == null) {
    // do something
}

In Android, you can access decoder from Recognizer:

if(recognizer.getDecoder().lookupWord("abc") == null) {
    // do something
}
Felon answered 2/3, 2016 at 9:22 Comment(4)
You do not need to make two recognizers, you need to use single recognizer and share it across activities.Felon
You request recognizer service for a word and get a result back, just add another request code.Felon
Hey Nikolay, I was wondering if we can use Pocketsphinx for continuous voice recognition on IOS. Is that possible? Thanks!Grave
@NikolayShmyrev can you please tell me that where i have to put this code in androidEncephalogram
L
0

Read the file using BufferedReader and store all the words in ArrayList

ArrayList<String> dictionary = new ArrayList<>();
String line;
BufferedReader reader = new BufferedReader(new FileReader(dictionaryFile));
while((line = reader.readLine()) != null) {
    if(line.trim().length() <= 0 ) {
        continue;
    }
    String word = line.split(" ")[0].trim();
    word = word.replaceAll("[^a-zA-Z]", "");
    dictionary.add(word);
}

then check if word present in dictionary using

dictionary.contains(yourString);

Hope it'll help.

Lockwood answered 2/3, 2016 at 3:9 Comment(3)
There is no need to waste memory and read the file again, it is already parsed by decoder. Also, other words might be added to the decoder beside the words in file.Felon
Reading file just once before initializing the system and not every time to check the work..I agree that it wastes a lot of memory as it holds all words in ArrayList...Lockwood
You code is also very inefficient in terms of performance, it is much better to use HashSet, not ArrayList. Searching for an element in a list of hundred thousand entries is not a good idea.Felon
E
0

You could load the dictionary to a arraylist by reading it line by line and to get only the words do

arraylist.add(line.split("\\s+")[0]);

And then check if it exist by

if(arraylist.contains(word))

Expressly answered 2/3, 2016 at 3:9 Comment(4)
@Ruchir Baronia Ohh sorry the \s+ is a white space and int the dictionary there is a word (that you want) then a whitespace and then the stuff you don't want so if you split it by the white space and get the first string in the array you will get only the wordExpressly
So it splits everything after the white space?Grave
@RuchirBaronia It will split up the line in multiple words in a string array and the word you want is the first therefore you will get the 1st string in the array and add it to the listExpressly
There is no need to waste memory and read the file again, it is already parsed by decoder. Also, other words might be added to the decoder beside the words in file.Felon

© 2022 - 2024 — McMap. All rights reserved.