What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?
Asked Answered
D

4

81

There are two similar namespaces and assemblies for speech recognition in .NET. I’m trying to understand the differences and when it is appropriate to use one or the other.

There is System.Speech.Recognition from the assembly System.Speech (in System.Speech.dll). System.Speech.dll is a core DLL in the .NET Framework class library 3.0 and later

There is also Microsoft.Speech.Recognition from the assembly Microsoft.Speech (in microsoft.speech.dll). Microsoft.Speech.dll is part of the UCMA 2.0 SDK

I find the docs confusing and I have the following questions:

System.Speech.Recognition says it is for "The Windows Desktop Speech Technology", does this mean it cannot be used on a server OS or cannot be used for high scale applications?

The UCMA 2.0 Speech SDK ( http://msdn.microsoft.com/en-us/library/dd266409%28v=office.13%29.aspx ) says that it requires Microsoft Office Communications Server 2007 R2 as a prerequisite. However, I’ve been told at conferences and meetings that if I do not require OCS features like presence and workflow I can use the UCMA 2.0 Speech API without OCS. Is this true?

If I’m building a simple recognition app for a server application (say I wanted to automatically transcribe voice mails) and I don’t need features of OCS, what are the differences between the two APIs?

Dormancy answered 4/6, 2010 at 19:54 Comment(0)
D
106

The short answer is that Microsoft.Speech.Recognition uses the Server version of SAPI, while System.Speech.Recognition uses the Desktop version of SAPI.

The APIs are mostly the same, but the underlying engines are different. Typically, the Server engine is designed to accept telephone-quality audio for command & control applications; the Desktop engine is designed to accept higher-quality audio for both command & control and dictation applications.

You can use System.Speech.Recognition on a server OS, but it's not designed to scale nearly as well as Microsoft.Speech.Recognition.

The differences are that the Server engine won't need training, and will work with lower-quality audio, but will have a lower recognition quality than the Desktop engine.

Discrown answered 6/6, 2010 at 2:39 Comment(0)
D
54

I found Eric’s answer really helpful, I just wanted to add some more details that I found.

System.Speech.Recognition can be used to program the desktop recognizers. SAPI and Desktop recognizers have shipped in the products:

  • Windows XP: SAPI v5.1 and no recognizer
  • Windows XP Tablet Edition: SAPI v5.1 and Recognizer v6.1
  • Windows Vista: SAPI v5.3 and Recognizer v8.0
  • Windows 7: SAPI v5.4 and Recognizer v8.0?

Servers come with SAPI, but no recognizer:

  • Windows Server 2003: SAPI v5.1 and no recognizer
  • Windows Server 2008 and 2008 R2: SAPI v5.3? and no recognizer

Desktop recognizers have also shipped in products like office.

  • Microsoft Office 2003: Recognizer v6.1

Microsoft.Speech.Recognition can be used to program the server recognizers. Server recognizers have shipped in the products:

  • Speech Server (various versions)
  • Office Communications Server (OCS) (various versions)
  • UCMA – which is a managed API for OCS that (I believe) included a redistributable recognizer
  • Microsoft Server Speech Platform – recognizer v10.2

The complete SDK for the Microsoft Server Speech Platform 10.2 version is available at http://www.microsoft.com/downloads/en/details.aspx?FamilyID=1b1604d3-4f66-4241-9a21-90a294a5c9a4. The speech engine is a free download. Version 11 is now available at http://www.microsoft.com/download/en/details.aspx?id=27226.

For Microsoft Speech Platform SDK 11 info and downloads, see:

Desktop recognizers are designed to run inproc or shared. Shared recognizers are useful on the desktop where voice commands are used to control any open applications. Server recognizers can only run inproc. Inproc recognizers are used when a single application uses the recognizer or when wav files or audio streams need to be recognized (shared recognizers can’t process audio files, just audio from input devices).

Only Desktop speech recognizers include a dictation grammar (system provided grammar used for free text dictation). The class System.Speech.Recognition.DictationGrammar has no complement in the Microsoft.Speech namespace.

You can use use the APIs to query determine your installed recongizers

  • Desktop: System.Speech.Recognition.SpeechRecognitionEngine.InstalledRecognizers()
  • Server: Microsoft.Speech.Recognition.SpeechRecognitionEngine.InstalledRecognizers()

I found that I can also see what recognizers are installed by looking at the registry keys:

  • Desktop recognizers: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Recognizers\Tokens
  • Server recognizers: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech Server\v10.0\Recognizers\Tokens

--- Update ---

As discussed in Microsoft Speech Recognition - what reference do I have to add?, Microsoft.Speech is also the API used for the Kinect recognizer. This is documented in the MSDN article http://msdn.microsoft.com/en-us/library/hh855387.aspx

Dormancy answered 8/6, 2010 at 15:50 Comment(3)
If I read the docs correctly, only Desktop speech recognizers include the dictation grammar (system provided grammar used for free text dictation). The class System.Speech.Recognition.DictationGrammar has no complement in the Microsoft.Speech namespace.Dormancy
According do MSDN API there is a DictationGrammar and a WildcardGrammar in C# (and I use it). But I don't know how to activate it throught XML (hacking parser ?) see also: #12101620Dziggetai
I installed Italian language downloaded from microsoft.com/en-us/download/details.aspx?id=27224 but System.Speech.Recognition.SpeechRecognitionEngine.InstalledRecognizers shows en-US as only active language for speech recognition. I miss something?Coalesce
O
7

Here is the link for the Speech Library (MS Server Speech Platform):

Microsoft Server Speech Platform 10.1 Released (SR and TTS in 26 languages)

Orgasm answered 1/10, 2010 at 19:2 Comment(5)
10.2 was released recently too. microsoft.com/downloads/en/…Dormancy
Thanks for the info. I see that now includes a grammar validator. I was breaking my head trying to find errors in the ones I'm creating. Where should I keep watching for news about future releases?Orgasm
"Where should I keep watching for news about future releases?" is a great question!!! microsoft.com/speech/developers.aspx is out of date. The speech blogs like blogs.msdn.com/b/speak and blogs.msdn.com/b/speech don't always have the latest updates. You might try gotspeech.net or the related sites like gotuc.net. But as you can see, I haven't found a great source to keep up to date either.Dormancy
most recent at the time of writing is version 11 - microsoft.com/en-us/download/details.aspx?id=27225Bethezel
btw, since grammar validator was mentioned above, see msdn.microsoft.com/en-us/library/hh378407%28v=office.14%29.aspx for a list of grammar toolsBethezel
B
5

Seems Microsoft wrote an article that clears things up regarding the differences between Microsoft Speech Platform and Windows SAPI - https://msdn.microsoft.com/en-us/library/jj127858.aspx. A difference I found myself while converting Speech recognition code for Kinect from Microsoft.Speech to System.Speech (see http://github.com/birbilis/Hotspotizer) was that the former supports SGRS grammars with tag-format=semantics/1.0-literals, while the latter doesn't and you have to convert to semantics/1.0 by changing x to out="x"; at tags

Bethezel answered 9/9, 2015 at 20:55 Comment(2)
btw, you may find my SpeechLib code useful (SpeechLib.codeplex.com). You can remove the System.Speech reference from there and use Microsoft.Speech instead and set the appropriate conditional compilation symbol (see the sources) to use Microsoft.Speech in the code (mostly affects the using clauses, rest of the code is identical)Bethezel
the SpeechLib mentioned at the above comment has moved to github.com/zoomicon/SpeechLib (since Codeplex is freezed to archive mode now)Bethezel

© 2022 - 2024 — McMap. All rights reserved.