Training sapi : Creating transcripted wav files and adding file paths to registry

In this post I address how to perform appendTranscript successfully, and speech training using WAV files (credit to Bill Hutchinson). Everything is in C++.

The E_NONINTERFACE happens if the ISPStream has no contents. For example the file was empty; the call didn't succeed but still returned s_OK (it does this for some reason). So normally I would investigate if the stream actually has any contents first. You can do this by checking its size:

Here is an example. If it has a size of 0 or some absurdly large size then obviously it hasn't returned a correct value. Bear in mind the returned value is a ULARGE_INTEGER.

STATSTG streamInfo;
cpStream->Stat(&streamInfo, STATFLAG_DEFAULT);
ULARGE_INTEGER streamSizeULI;
streamSizeULI = streamInfo.cbSize;

SPBindToFile only works with SPFM_OPEN_READONLY and SPFM_CREATE_ALWAYS, so you will have to use one of those.

As for how to make the appended transcript save, it seems that you cannot save it directly if the wav file already exists (or at least I don't know how). If the file doesn't exist yet, you can create a new ispstream and when you pass audio information to it for example by voice or microphone (there are plenty of examples on the web), you can append a transcript then and it will stick. I include an example below.

Appending a transcript onto a new file:

void recordAndAppendTranscriptInOneOperation() {
HRESULT             hr = S_OK;
CComPtr <ISpVoice>      cpVoice;
CComPtr <ISpStream>     cpStream;
CComPtr<ISpTranscript>  cpTranscript;
CSpStreamFormat         cAudioFmt;

//Create    a   SAPI    Voice   
hr  =   cpVoice.CoCreateInstance(CLSID_SpVoice);

char filePathOut[] = R"(C:\SAPI\SampleOutput\SP_Sample.wav)";

//Set   the audio   format                              
if(SUCCEEDED(hr))   
{       
    hr  =   cAudioFmt.AssignFormat(SPSF_22kHz16BitMono);    
}

//Call  SPBindToFile,   a   SAPI    helper  method,     to  bind    the audio       
if(SUCCEEDED(hr))   
{
    hr = SPBindToFile(filePathOut, SPFM_CREATE_ALWAYS, &cpStream, &cAudioFmt.FormatId(), cAudioFmt.WaveFormatExPtr());
}

//set   the output  to  cpStream    so  that    the output  audio   data    wil                             
if(SUCCEEDED(hr))   
{       
    hr = cpVoice->SetOutput(cpStream, TRUE);    
}

//Speak the text    “hello  world”  synchronously                               
if(SUCCEEDED(hr))   
{       
    hr = cpVoice->Speak(L"Hello World", SPF_DEFAULT, NULL);
}

//close the stream  
if(SUCCEEDED(hr))   
{
    PWCHAR                      pwszTranscript;
    char NewTranscriptAsString[] = R"(This is a test)";
    LPCWSTR NewTranscript = charToLPSTRW(NewTranscriptAsString);

    hr = cpStream.QueryInterface(&cpTranscript);
    hr = cpTranscript->AppendTranscript(NULL);
    hr = cpTranscript->AppendTranscript(NewTranscript);
    hr = cpTranscript->GetTranscript(&pwszTranscript);

    hr  =   cpStream->Close();  
}

//Release   the stream  and voice   object  
cpStream.Release(); 
cpVoice.Release();
 }

Bill Hutchinson (one of the linked sources below) has some code that can be used to perform recognizer training with out all the registry edits and so on. I have included it at the end of this post. He has a function (TrainOne) which trains the recognizer file by file, via memory stream. You can pass preexisting WAVs to this. Specifically either WAVs with transcripts, or WAVs with out transcripts and (then provide the transcript to the function at call time). Please take a look at it as it is very informative.

Here is a collection of all knowledge related to SAPI that I have found, that will be useful for others trying to figure this mess out. I will also post my own complete SAPI training solution soon:

Sample training code:

Since Bill Hutchinson's SAPI code is one of the few reliable examples of how to use SAPI for training on the web, I have included his post from google below, in case it is one day deleted/lost:

#include "stdafx.h"
#include "sphelper.h"
#include <sapi.h>
#include <string.h>
//MAIN() is last function below
inline HRESULT ReturnResult(ISpRecoContext * pRecoCtxt, ISpRecoResult
** ppResult)
{
        HRESULT hr = S_OK;
        CSpEvent spEvent;
        while (S_OK == pRecoCtxt->WaitForNotifyEvent(INFINITE))
        {
                while (S_OK == spEvent.GetFrom(pRecoCtxt))
                {
                        switch (spEvent.eEventId)
                        {
                                case SPEI_RECOGNITION:
                                        *ppResult = spEvent.RecoResult();
                                        if (*ppResult)
                                        {
                                                (*ppResult)->AddRef();
                                        }
                                        return hr;
                                case [OTHER EVENTS]
                    spEvent.Clear();
        }
        return hr;
}
inline HRESULT TrainOneFile(ISpRecoContext * cpRecoCtxt, ISpRecognizer
* cpRecognizerBase, ISpRecoGrammar * cpGrammar)
{
        CComPtr<ISpStream>      cpStream;
        CComPtr<ISpRecoResult>        cpResult;
        CComPtr<ISpTranscript>  cpTranscript;
        PWCHAR                  pwszTranscript;
        HRESULT hr = S_OK;
        hr = cpStream.CoCreateInstance(CLSID_SpStream);
        // Bind a stream to an existing wavefile
        if (SUCCEEDED(hr))        {
                hr = cpStream->BindToFile(L"C:\\XX.wav",                                                        SPFM_OPEN_READONLY,
                        NULL,
                        NULL,
                        SPFEI_ALL_EVENTS);
        }
        if (SUCCEEDED(hr)){
                hr = cpStream.QueryInterface(&cpTranscript);
        }
        if (SUCCEEDED(hr)) {
                hr = cpTranscript->GetTranscript(&pwszTranscript);
        }
        //THIS IS ALTERNATE CODE FOR PREVIOUS LINE, FOR SOUND FILES THAT
DON’T HAVE A TRANSCRIPT ATTACHED
        LPCWSTR sCorrectText = L"Anyone who has spent time on a farm knows
there is a rhythm to the year.";
        if (SUCCEEDED(hr)){
                hr = cpTranscript->AppendTranscript(s);
        }
        if (SUCCEEDED(hr))        {
                hr = cpTranscript->GetTranscript(&pwszTranscript);
        }
        if(SUCCEEDED(hr)){
                hr = cpRecognizerBase->SetInput(cpStream, TRUE);
        }
        USES_CONVERSION;
        CSpDynamicString dstrText;
        if (SUCCEEDED (hr)){
                hr = cpGrammar->SetDictationState(SPRS_ACTIVE);
        }
        if (SUCCEEDED(hr)){
                hr = ReturnResult(cpRecoCtxt, &cpResult);
        }
        if (SUCCEEDED(hr)){
                hr = cpGrammar->SetDictationState( SPRS_INACTIVE );
        }
        if ((cpResult) &&(SUCCEEDED(hr))){
                hr = cpResult-
>GetText(SP_GETWHOLEPHRASE,SP_GETWHOLEPHRASE,TRUE,&dstrText,NULL);
        }
        CComPtr<ISpRecoResult2> cpResult2;
        if (SUCCEEDED(hr)){
                hr = cpResult.QueryInterface<ISpRecoResult2>(&cpResult2);
        }
        if (SUCCEEDED(hr)){
//COMMITTEXT SHOULD FORCE ADAPTATION OF MODELS TO CORRECT TEXT
//(THO IT SHOULD BE REDUNDANT WITH SETTRAININGSTATE() ?)
                hr = cpResult2-
>CommitText(SP_GETWHOLEPHRASE,SP_GETWHOLEPHRASE,sCorrectText,SPCF_DEFINITE_CORRECTION);
                cpResult.Release();
                cpResult2.Release();
        }
        return hr;
}

int _tmain(int argc, _TCHAR* argv[])
{
        HRESULT hr = S_OK;
        CComPtr<ISpRecognizer2> cpRecognizer;
        CComPtr<ISpRecoContext> cpRecoCtxt;
        CComPtr<ISpRecoGrammar> cpGrammar;
        CComPtr<ISpRecognizer> cpRecognizerBase;
        hr = ::CoInitialize(NULL);
            if (SUCCEEDED(hr)) {
                hr = cpRecognizer.CoCreateInstance(CLSID_SpInprocRecognizer);
        }
        if (SUCCEEDED(hr)){
                hr = cpRecognizer.QueryInterface<ISpRecognizer>(&cpRecognizerBase);
        }
        if (SUCCEEDED(hr)){
                hr = cpRecognizerBase->CreateRecoContext(&cpRecoCtxt);
        }
        if (cpRecoCtxt){
                hr = cpRecoCtxt->CreateGrammar(0, &cpGrammar);
        }
        if (SUCCEEDED(hr)){
                hr = cpGrammar->LoadDictation(NULL, SPLO_STATIC);
        }
        if (SUCCEEDED(hr)){
                hr = cpRecognizer->SetTrainingState(TRUE, TRUE);
        }
        if (SUCCEEDED(hr)){
                hr = cpRecoCtxt->SetNotifyWin32Event();
        }
        if (SUCCEEDED(hr)){
                hr = cpRecoCtxt->SetInterest(
                        SPFEI(SPEI_RECOGNITION)|
                        SPFEI(SPEI_HYPOTHESIS)|
                        SPFEI(SPEI_FALSE_RECOGNITION),
                        SPFEI(SPEI_RECOGNITION)|
                        SPFEI(SPEI_HYPOTHESIS)|
                        SPFEI(SPEI_FALSE_RECOGNITION));
        }
        if (SUCCEEDED(hr)){
                hr = TrainOneFile(cpRecoCtxt, cpRecognizerBase, cpGrammar);
        }
        if (SUCCEEDED(hr)){//RERUN TO CHECK FOR IMPROVEMENT
                hr = TrainOneFile(cpRecoCtxt, cpRecognizerBase, cpGrammar);
        }
        cpRecognizer->SetTrainingState(FALSE, TRUE);//should turn off and
save changes
        ::CoUninitialize();
        return 0;
}

Recommended topics

Hot tags