Streaming Audio in FLAC or AMR_WB to the Google Speech API

About

Asked 6/10, 2018 at 3:22 Answered 6/11, 2018 at 3:58

I need to run the google speech api in somewhat low bandwidth environments.

Based on reading about best practices, it seems my best bet is to use the AMR_WB format.

However, the following code produces no exceptions, and I get no responses in the onError(t: Throwable) method, but the API is not returning any values at all in the onNext(value: StreamingRecognizeResponse) method.

If I change the format in .setEncoding() from FLAC or AMR_WB back to LINEAR16 everything works fine.

AudioEmitter.kt

fun start(
            encoding: Int = AudioFormat.ENCODING_PCM_16BIT,
            channel: Int = AudioFormat.CHANNEL_IN_MONO,
            sampleRate: Int = 16000,
            subscriber: (ByteString) -> Unit
    )

MainActivity.kt

builder.streamingConfig = StreamingRecognitionConfig.newBuilder()
        .setConfig(RecognitionConfig.newBuilder()
                .setLanguageCode("en-US")
                .setEncoding(RecognitionConfig.AudioEncoding.AMR_WB)
                .setSampleRateHertz(16000)
                .build())
        .setInterimResults(true)
        .setSingleUtterance(false)
        .build()

Edwinaedwine answered 6/10, 2018 at 3:22 Comment(5)

I think the problem may come from your sampleRate of AudioEmitter. Try to set it to 44100, 22050 or 11025 when the encoding type in streaming recognition is FLAC. – Ungley 1/11, 2018 at 14:12

Maybe you can follow this official troubleshooting procedure? cloud.google.com/speech-to-text/docs/support#troubleshooting to define where the issue comes from. – Sibilant 5/11, 2018 at 13:30

@Ungley I've messed with those settings, unfortunately it didn't help. – Edwinaedwine 5/11, 2018 at 17:34

@Bsquare Looked at those many times. Have tried every possible combination of settings I can find, and still no luck. It looks like both here and on the cloud-speech-discuss forum the team is completely disengaged. – Edwinaedwine 5/11, 2018 at 17:42

Did you try converting your sound file in FLAC or something else, just to check if it is a key in your issue? – Sibilant 7/11, 2018 at 8:47

Google won't recognize your data because you tell it the data is in FLAC or AMR_WB format, while you keep passing raw, uncompressed audio chunks that AudioRecord.read() produces.

Now, in order to make it work you have two choices. The first is to convert the data to the required format yourself, possibly using some third-party library. The second one is to use MediaRecorder from the Android library. Unfortunately, it supports only writing to a file-like destination, so you cannot simply replace AudioRecorder with it, but there's a workaround described in this answer.

Ifni answered 6/11, 2018 at 3:58 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags