I'm using open ears for speech recognition in my app. The major concern is the accuracy. In a quiet environment there is about 50% accuracy, but things get worse in a noisy environment. Almost nothing is recognized correctly. I'm using a dictionary file of about 300 words at present. What are the areas I should look for to improve accuracy? Up to now I haven't done any tweaking on this.
The design of speech recognition applications requires you to understand some basic concepts behind speech recognition such as an acoustic model, grammar, and the phonetic dictionary. You can learn more from a CMUSphinx tutorial http://cmusphinx.sourceforge.net/wiki/tutorial
Bad accuracy is a normal state of the speech application development, there is a process which you can use to improve it and make the application useful. The process is the following:
Collect speech samples you are trying to recognize and create a speech database to measure the current accuracy and understand the issues behind it
Try to play with the vocabulary size in order to improve the separation between different voice prompts. For example the vocabulary of 10 commands is way easier to recognize than the vocabulary of 300 commands.
Design your application the way that the number of variants to recognize is less and the answers of people are straightforward. This activity is called VUI (voice user interface design) and it's quite a big area with many brilliant books and blog articles. You can find some details here: http://www.amazon.com/Voice-Interface-Design-Michael-Cohen/dp/0321185765
Try to improve the acoustic part of your application. Modify the dictionary to match your speech. Adapt the acoustic model to match the acoustic properties. See http://cmusphinx.sourceforge.net/wiki/tutorialadapt for the description of the acoustic model adaptation process.
© 2022 - 2024 — McMap. All rights reserved.