Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#

private void button1_Click(object sender, EventArgs e) { //Add choices to grammar. Choices mychoices = new Choices(); mychoices.Add("one"); mychoices.Add("two"); mychoices.Add("three"); mychoices.Add("four"); mychoices.Add("five"); mychoices.Add("six"); mychoices.Add("seven"); mychoices.Add("eight"); mychoices.Add("nine"); mychoices.Add("zero"); mychoices.Add("1"); mychoices.Add("2"); mychoices.Add("3"); mychoices.Add("4"); mychoices.Add("5"); mychoices.Add("6"); mychoices.Add("7"); mychoices.Add("8"); mychoices.Add("9"); mychoices.Add("0"); Grammar myGrammar = new Grammar(new GrammarBuilder(mychoices)); //Create the engine. SpeechRecognitionEngine reco = new SpeechRecognitionEngine(); //Read audio stream from wav file. reco.SetInputToWaveFile("3.wav"); reco.LoadGrammar(myGrammar); //Get the recognized value. reco.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(reco_SpeechRecognized); reco.RecognizeAsync(RecognizeMode.Multiple); } void reco_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { MessageBox.Show(e.Result.Text); }

How did you create your WAV file? It looks like it has a high bitrate. There are only certain formats supported by the recognizer. Try:

8 bits per sample
single channel mono
22,050 samples per second
PCM encoding

You have about 3 seconds of audio and the file size is 520 KB. That seems too big for the supported formats.

You can use the RecognizerInfo class to find the supported audio formats (SupportedAudioFormats) for your recognizer - RecognizerInfo.SupportedAudioFormats Property.

Update:

Your audio file is kind of a mess. It is very noisy. It is also in an unsupported format. Audacity reports it as stereo, 44.1 kHz, and 32-bit float. I silenced the noise in the beginning and end, resampled to 22.050 kHz, removed the stereo track, and then exported as uncompressed 8-bit unsigned WAV. It then works fine.

On my Windows 7 machine, my default recognizer supports only the following audio formats:

  0:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 16000

  1:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 16000

  2:
  Encodingformat = Pcm
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  3:
  Encodingformat = Pcm
  BitsPerSample = 16
  BlockAlign = 2
  ChannelCount = 1
  SamplesPerSecond  = 22050

  4:
  Encodingformat = ALaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

  5:
  Encodingformat = ULaw
  BitsPerSample = 8
  BlockAlign = 1
  ChannelCount = 1
  SamplesPerSecond  = 22050

You should also remove the numeric choices from the grammar. Right now the recognizer returns two alternates: "three" and "3". This probably isn't what you want. You could use a semantic result value in your grammar to return the number 3 for the word "three".

Recommended topics

Hot tags