Update: This answer has been edited following investigation. Initially I was suggesting from memory that SupportedAudioFormats is likely just from (possibly misconfigured) registry data; investigation has shown that for me, on Windows 7, this is definitely the case, and is backed up acecdotally on Windows 8.
Issues with SupportedAudioFormats
System.Speech
wraps the venerable COM speech API (SAPI) and some voices are 32 vs 64 bit, or can be misconfigured (on a 64 bit machine's registry, HKLM/Software/Microsoft/Speech/Voices
vs HKLM/Software/Wow6432Node/Microsoft/Speech/Voices
.
I've pointed ILSpy at System.Speech
and its VoiceInfo
class, and I'm pretty convinced that SupportedAudioFormats comes solely from registry data, hence it's possible to get zero results back when enumerating SupportedAudioFormats
if either your TTS engine isn't properly registered for your application's Platform target (x86, Any or 64 bit), or if the vendor simply doesn't provide this information in the registry.
Voices may still support different, additional or fewer formats, as that's up to the speech engine (code) rather than the registry (data). So it can be a shot in the dark. Standard Windows voices are often times more consistent in this regard than third party voices, but they still don't necessarily usefully provide SupportedAudioFormats
.
Finding this Information the Hard Way
I've found it's still possible to get the current format of the current voice - but this does rely on reflection to access the internals of the System.Speech SAPI wrappers.
Consequently this is quite fragile code! And I wouldn't recommend use in production.
Note: The below code does require you to have called Speak() once for setup; more calls would be needed to force setup without Speak(). However, I can call Speak("")
to say nothing and that works just fine.
Implementation:
[StructLayout(LayoutKind.Sequential)]
struct WAVEFORMATEX
{
public ushort wFormatTag;
public ushort nChannels;
public uint nSamplesPerSec;
public uint nAvgBytesPerSec;
public ushort nBlockAlign;
public ushort wBitsPerSample;
public ushort cbSize;
}
WAVEFORMATEX GetCurrentWaveFormat(SpeechSynthesizer synthesizer)
{
var voiceSynthesis = synthesizer.GetType()
.GetProperty("VoiceSynthesizer", BindingFlags.Instance | BindingFlags.NonPublic)
.GetValue(synthesizer, null);
var ttsVoice = voiceSynthesis.GetType()
.GetMethod("CurrentVoice", BindingFlags.Instance | BindingFlags.NonPublic)
.Invoke(voiceSynthesis, new object[] { false });
var waveFormat = (byte[])ttsVoice.GetType()
.GetField("_waveFormat", BindingFlags.Instance | BindingFlags.NonPublic)
.GetValue(ttsVoice);
var pin = GCHandle.Alloc(waveFormat, GCHandleType.Pinned);
var format = (WAVEFORMATEX)Marshal.PtrToStructure(pin.AddrOfPinnedObject(), typeof(WAVEFORMATEX));
pin.Free();
return format;
}
Usage:
SpeechSynthesizer s = new SpeechSynthesizer();
s.Speak("Hello");
var format = GetCurrentWaveFormat(s);
Debug.WriteLine($"{s.Voice.SupportedAudioFormats.Count} formats are claimed as supported.");
Debug.WriteLine($"Actual format: {format.nChannels} channel {format.nSamplesPerSec} Hz {format.wBitsPerSample} audio");
To test it, I renamed Microsoft Anna's AudioFormats
registry key under HKLM/Software/Wow6432Node/Microsoft/Speech/Voices/Tokens/MS-Anna-1033-20-Dsk/Attributes
, causing SpeechSynthesizer.Voice.SupportedAudioFormats
to have no elements when queried. The below is the output in this situation:
0 formats are claimed as supported.
Actual format: 1 channel 16000 Hz 16 audio