How can I use google text to speech api in windows form?
Asked Answered
I

2

1

I want to use google text to speech in my windows form application, it will read a label. I added System.Speech reference. How can it read a label with a button click event? http://translate.google.com/translate_tts?q=testing+google+speech This is the google text to speech api, or how can I use microsoft's native text to speech?

Interest answered 11/2, 2012 at 19:56 Comment(4)
You do have to make up your mind about what company's text-to-speech API you want to use. The link you provided is dead, you'll have better odds with the System.Speech.Synthesize.SpeechSynthezer class. Use its SpeakAsync() method for maximum bang for the buck in the .NET world instead of the "it will be better/different tomorrow" world of the internet.Constitute
@HansPassant - The link works for me. I wonder why it won't work for you.Poon
Hmm, I wonder why too. Having zero ways to debug it is in itself enough for me.Constitute
@HansPassant - Point taken. It ended up being a pretty good Saturday afternoon project in the end though.Poon
P
7

UPDATE Google's TTS API is no longer publically available. The notes at the bottom about Microsoft's TTS are still relevant and provide equivalent functionality.


You can use Google's TTS API from your WinForm application by playing the response using a variation of this question's answer (it took me a while but I have a real solution):

public partial class Form1 : Form
{
    public Form1()
    {
        InitializeComponent();
        this.FormClosing += (sender, e) =>
            {
                if (waiting)
                    stop.Set();
            };
    }

    private void ButtonClick(object sender, EventArgs e)
    {
        var clicked = sender as Button;
        var relatedLabel = this.Controls.Find(clicked.Tag.ToString(), true).FirstOrDefault() as Label;

        if (relatedLabel == null)
            return;

        var playThread = new Thread(() => PlayMp3FromUrl("http://translate.google.com/translate_tts?q=" + HttpUtility.UrlEncode(relatedLabel.Text)));
        playThread.IsBackground = true;
        playThread.Start();
    }

    bool waiting = false;
    AutoResetEvent stop = new AutoResetEvent(false);
    public void PlayMp3FromUrl(string url)
    {
        using (Stream ms = new MemoryStream())
        {
            using (Stream stream = WebRequest.Create(url)
                .GetResponse().GetResponseStream())
            {
                byte[] buffer = new byte[32768];
                int read;
                while ((read = stream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    ms.Write(buffer, 0, read);
                }
            }

            ms.Position = 0;
            using (WaveStream blockAlignedStream =
                new BlockAlignReductionStream(
                    WaveFormatConversionStream.CreatePcmStream(
                        new Mp3FileReader(ms))))
            {
                using (WaveOut waveOut = new WaveOut(WaveCallbackInfo.FunctionCallback()))
                {
                    waveOut.Init(blockAlignedStream);
                    waveOut.PlaybackStopped += (sender, e) =>
                    {
                        waveOut.Stop();
                    };

                    waveOut.Play();
                    waiting = true;
                    stop.WaitOne(10000);
                    waiting = false;
                }
            }
        }
    }
}

NOTE: The above code requires NAudio to work (free/open source) and using statements for System.Web, System.Threading, and NAudio.Wave.

My Form1 has 2 controls on it:

  1. A Label named label1
  2. A Button named button1 with a Tag of label1 (used to bind the button to its label)

The above code can be simplified slightly if a you have different events for each button/label combination using something like (untested):

    private void ButtonClick(object sender, EventArgs e)
    {
        var clicked = sender as Button;

        var playThread = new Thread(() => PlayMp3FromUrl("http://translate.google.com/translate_tts?q=" + HttpUtility.UrlEncode(label1.Text)));
        playThread.IsBackground = true;
        playThread.Start();
    }

There are problems with this solution though (this list is probably not complete; I'm sure comments and real world usage will find others):

  1. Notice the stop.WaitOne(10000); in the first code snippet. The 10000 represents a maximum of 10 seconds of audio to be played so it will need to be tweaked if your label takes longer than that to read. This is necessary because the current version of NAudio (v1.5.4.0) seems to have a problem determining when the stream is done playing. It may be fixed in a later version or perhaps there is a workaround that I didn't take the time to find. One temporary workaround is to use a ParameterizedThreadStart that would take the timeout as a parameter to the thread. This would allow variable timeouts but would not technically fix the problem.
  2. More importantly, the Google TTS API is unofficial (meaning not to be consumed by non-Google applications) it is subject to change without notification at any time. If you need something that will work in a commercial environment I'd suggest either the MS TTS solution (as your question suggests) or one of the many commercial alternatives. None of which tend to be even this simple though.

To answer the other side of your question:

The System.Speech.Synthesis.SpeechSynthesizer class is much easier to use and you can count on it being available reliably (where with the Google API, it could be gone tomorrow).

It is really as easy as including a reference to the System.Speech reference and:

public void SaySomething(string somethingToSay)
{
    var synth = new System.Speech.Synthesis.SpeechSynthesizer();

    synth.SpeakAsync(somethingToSay);
}

This just works.

Trying to use the Google TTS API was a fun experiment but I'd be hard pressed to suggest it for production use, and if you don't want to pay for a commercial alternative, Microsoft's solution is about as good as it gets.

Poon answered 11/2, 2012 at 20:16 Comment(7)
how can i use this in button click event?Hoofbound
@user1136403 - See my update for how to work this into a button click event.Poon
@user1136403 - I've updated the code in my answer to reflect changes necessary based on testing.Poon
@user1136403 - Completely revamped my answer to reflect fairly deep testing and includes a list of issues/concerns. I've burned a fair amount of time on this because it sounded interesting so please provide feedback as time allows.Poon
The System.Speechis a much easier but it doesn't support multiple languages.Arrear
This answer no longer works effective November 2015.Eddyede
As you predicted, the Google API is no longer available, or is behind multiple captchas.Brig
B
2

I know this question is a bit out of date but recently Google published Google Cloud Text To Speech API.

.NET Client version of Google.Cloud.TextToSpeech can be found here: https://github.com/jhabjan/Google.Cloud.TextToSpeech.V1

Here is short example how to use the client:

GoogleCredential credentials =
    GoogleCredential.FromFile(Path.Combine(Program.AppPath, "jhabjan-test-47a56894d458.json"));

TextToSpeechClient client = TextToSpeechClient.Create(credentials);

SynthesizeSpeechResponse response = client.SynthesizeSpeech(
    new SynthesisInput()
    {
        Text = "Google Cloud Text-to-Speech enables developers to synthesize natural-sounding speech with 32 voices"
    },
    new VoiceSelectionParams()
    {
        LanguageCode = "en-US",
        Name = "en-US-Wavenet-C"
    },
    new AudioConfig()
    {
        AudioEncoding = AudioEncoding.Mp3
    }
);

string speechFile = Path.Combine(Directory.GetCurrentDirectory(), "sample.mp3");

File.WriteAllBytes(speechFile, response.AudioContent);
Bathtub answered 30/4, 2018 at 12:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.