Using Microsoft's SAPI 5.3 Speech API on Vista, how do you programatically do acoustic model training of a RecoProfile? More concretely, if you have a text file, and an audio file of a user speaking that text, what sequence of SAPI calls would you make to train the user's profile using that text and audio?
Update:
More information about this problem I still haven't solved: You call ISpRecognizer2.SetTrainingState( TRUE, TRUE ) at "the beginning" and ISpRecognizer2.SetTrainingState( FALSE, TRUE ) at "the end." But it is still unclear just when those actions have to happen relative to other actions.
For example, you have to make various calls to set up a grammar with the text that matches your audio, and other calls to hook up the audio, and other calls to various objects to say "you're good to go now." But what are the interdependencies -- what has to happen before what else? And if you're using an audio file instead of the system microphone for input, does that make the relative timing less forgiving, because the recognizer isn't going to keep sitting there listening until the speaker gets it right?