Recording WAV to IBM Watson Speech-To-Text
Asked Answered
L

1

10

I'm trying to record audio and immediately send it to IBM Watson Speech-To-Text for transcription. I've tested Watson with a WAV file loaded from disk, and that worked. On the other end, I've also tested with recording from microphone and storing it to disk, works good too.

But when I try to record the audio with NAudio WaveIn, the result from Watson is empty, as if there's no audio.

Anyone who can shine a light on this, or someone has some ideas?

private async void StartHere()
{
    var ws = new ClientWebSocket();
    ws.Options.Credentials = new NetworkCredential("*****", "*****");

    await ws.ConnectAsync(new Uri("wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize?model=en-US_NarrowbandModel"), CancellationToken.None);

    Task.WaitAll(ws.SendAsync(openingMessage, WebSocketMessageType.Text, true, CancellationToken.None), HandleResults(ws));

    Record();
}

public void Record()
{
    var waveIn = new WaveInEvent
    {
        BufferMilliseconds = 50,
        DeviceNumber       = 0,
        WaveFormat         = format
    };

    waveIn.DataAvailable    += new EventHandler(WaveIn_DataAvailable);
    waveIn.RecordingStopped += new EventHandler(WaveIn_RecordingStopped);
    waveIn.StartRecording();
}

public void Stop() 
{
    await ws.SendAsync(closingMessage, WebSocketMessageType.Text, true, CancellationToken.None);
}

public void Close()
{
    ws.CloseAsync(WebSocketCloseStatus.NormalClosure, "Close", CancellationToken.None).Wait();
}

private void WaveIn_DataAvailable(object sender, WaveInEventArgs e)
{
    await ws.SendAsync(new ArraySegment(e.Buffer), WebSocketMessageType.Binary, true, CancellationToken.None);
}

private async Task HandleResults(ClientWebSocket ws)
{
    var buffer = new byte[1024];

    while (true)
    {
        var segment = new ArraySegment(buffer);
        var result = await ws.ReceiveAsync(segment, CancellationToken.None);

        if (result.MessageType == WebSocketMessageType.Close)
        {
            return;
        }

        int count = result.Count;
        while (!result.EndOfMessage)
        {
            if (count >= buffer.Length)
            {
                await ws.CloseAsync(WebSocketCloseStatus.InvalidPayloadData, "That's too long", CancellationToken.None);
                return;
            }

            segment = new ArraySegment(buffer, count, buffer.Length - count);
            result = await ws.ReceiveAsync(segment, CancellationToken.None);
            count += result.Count;
        }

        var message = Encoding.UTF8.GetString(buffer, 0, count);

        // you'll probably want to parse the JSON into a useful object here,
        // see ServiceState and IsDelimeter for a light-weight example of that.
        Console.WriteLine(message);

        if (IsDelimeter(message))
        {
            return;
        }
    }
}

private bool IsDelimeter(String json)
{
    MemoryStream stream = new MemoryStream(Encoding.UTF8.GetBytes(json));
    DataContractJsonSerializer ser = new DataContractJsonSerializer(typeof(ServiceState));
    ServiceState obj = (ServiceState) ser.ReadObject(stream);

    return obj.state == "listening";
}

[DataContract]
internal class ServiceState
{
    [DataMember]
    public string state = "";
}


Edit: I've also tried to send the WAV "header" prior to StartRecording, like this

    waveIn.DataAvailable    += new EventHandler(WaveIn_DataAvailable);
    waveIn.RecordingStopped += new EventHandler(WaveIn_RecordingStopped);

    /* Send WAV "header" first */
    using (var stream = new MemoryStream())
    {
        using (var writer = new BinaryWriter(stream, Encoding.UTF8))
        {
            writer.Write(Encoding.UTF8.GetBytes("RIFF"));
            writer.Write(0); // placeholder
            writer.Write(Encoding.UTF8.GetBytes("WAVE"));
            writer.Write(Encoding.UTF8.GetBytes("fmt "));

            format.Serialize(writer);

            if (format.Encoding != WaveFormatEncoding.Pcm && format.BitsPerSample != 0)
            {
                writer.Write(Encoding.UTF8.GetBytes("fact"));
                writer.Write(4);
                writer.Write(0);
            }

            writer.Write(Encoding.UTF8.GetBytes("data"));
            writer.Write(0);
            writer.Flush();
        }

        byte[] header = stream.ToArray();

        await ws.SendAsync(new ArraySegment(header), WebSocketMessageType.Binary, true, CancellationToken.None);
    }
    /* End WAV header */

    waveIn.StartRecording();
Loireatlantique answered 26/7, 2017 at 8:47 Comment(0)
L
4

Found the solution after ~20 hours of trial and error, I created a GitHub Gist, because it may be handy for others. See https://gist.github.com/kboek/20476c2a03b5e9188edebaace74f9a07

Loireatlantique answered 30/7, 2017 at 13:0 Comment(3)
Thanks for the solution. Will it help to record audio using microphone and send it to IBM Watson Speech-To-Text immediately without saving it locally ?Venison
This was 3 years ago; unfortunately I don't remember the details of this project. But you should be able to use the WaveInEvent to capture audio from your mic. I'm sure there are examples out there that explain how to use NAudio to record from mic.Loireatlantique
Can you please support here if possible ? #63655446Venison

© 2022 - 2024 — McMap. All rights reserved.