Very low accuracy while using open ears for speech recognition
Asked Answered
R

1

10

I'm using open ears for speech recognition in my app. The major concern is the accuracy. In a quiet environment there is about 50% accuracy, but things get worse in a noisy environment. Almost nothing is recognized correctly. I'm using a dictionary file of about 300 words at present. What are the areas I should look for to improve accuracy? Up to now I haven't done any tweaking on this.

Rosendorosene answered 15/9, 2011 at 11:51 Comment(0)
P
17

The design of speech recognition applications requires you to understand some basic concepts behind speech recognition such as an acoustic model, grammar, and the phonetic dictionary. You can learn more from a CMUSphinx tutorial http://cmusphinx.sourceforge.net/wiki/tutorial

Bad accuracy is a normal state of the speech application development, there is a process which you can use to improve it and make the application useful. The process is the following:

  1. Collect speech samples you are trying to recognize and create a speech database to measure the current accuracy and understand the issues behind it

  2. Try to play with the vocabulary size in order to improve the separation between different voice prompts. For example the vocabulary of 10 commands is way easier to recognize than the vocabulary of 300 commands.

  3. Design your application the way that the number of variants to recognize is less and the answers of people are straightforward. This activity is called VUI (voice user interface design) and it's quite a big area with many brilliant books and blog articles. You can find some details here: http://www.amazon.com/Voice-Interface-Design-Michael-Cohen/dp/0321185765

  4. Try to improve the acoustic part of your application. Modify the dictionary to match your speech. Adapt the acoustic model to match the acoustic properties. See http://cmusphinx.sourceforge.net/wiki/tutorialadapt for the description of the acoustic model adaptation process.

Pottle answered 16/9, 2011 at 10:46 Comment(6)
Thanks for the answer.I did some research on the 4 points that you mentioned.There are however a few things i need clarification on<br>1.Playing with the vocabulary size is not an option for me since the app is required to understand a large editable list of words.<br>2.I admit that i havent done research on vui design.The issue of accuracy has grown critical.In this situation would you advise me to plunge into that area ?How much time consuming will a research on VUI be?Rosendorosene
Ok.My main aim is to recgnise words poken by the user in a (noisy)class room environment.The user can add more words of his choice to the dictionary.The issue now is only with the accuracy.Also since me and the users are in different regions of the world the problem of accent is a major one.I am also doubtful if i will be able to get audio samples from the real users.Rosendorosene
Well, there are issues of course but they could be solved. Accent problem is solved with adaptation. Room reverberation problem is solved with robust features. Noise is usually filtered by noise cancellation in acoustic frontend. If you intended to work on this it's all doable.Pottle
Hmm..... Ok one more thing, i went through the CMU page for accoustic model adaptaion, but i find the steps are missing if multiple users voices have to be recognizedRosendorosene
Oh and this is an ipad app we are talking about(if it makes any difference :))Rosendorosene
Sorry, I'm not sure what steps do you miss, they all are there.Pottle

© 2022 - 2024 — McMap. All rights reserved.