It seems Microsoft offers quite a few speech recognition products, I'd like to know the differences among all of them pls.
There is Microsoft Speech API, or SAPI. But somehow Microsoft Cognitive Service Speech API has the same name.
Ok now, Microsoft Cognitive Service on Azure offers Speech service API and Bing Speech API. I assume for speech-to-text, both APIs are the same.
And then there is System.Speech.Recognition (or Desktop SAPI), Microsoft.Speech.Recognition (or Server SAPI) and Windows.Media.Speech.Recognition. Here and here have some explanations on the difference among the three. But my guesses are they are old speech recognition models based on HMM, aka are not neural network models, and all three can be used offline without internet connection, right?
For the Azure speech service and bing speech APIs, they are more advanced speech models right? But I assume there is no way to use them offline on my local machine, as they all require subscription verification. (even tho it seems Bing API has a C# desktop library..)
Essentially I want to have a offline model which does speech-to-text transcription, for my conversation data (5-10 mins for each audio recording), which recognises multi-speakers and outputs timestamps (or timecoded output). I am a bit confused now by all the options. I would be greatly appreciated if someone can explain to me, many thanks!