Streaming Audio in FLAC or AMR_WB to the Google Speech API
Asked Answered
E

1

26

I need to run the google speech api in somewhat low bandwidth environments.

Based on reading about best practices, it seems my best bet is to use the AMR_WB format.

However, the following code produces no exceptions, and I get no responses in the onError(t: Throwable) method, but the API is not returning any values at all in the onNext(value: StreamingRecognizeResponse) method.

If I change the format in .setEncoding() from FLAC or AMR_WB back to LINEAR16 everything works fine.

AudioEmitter.kt

fun start(
            encoding: Int = AudioFormat.ENCODING_PCM_16BIT,
            channel: Int = AudioFormat.CHANNEL_IN_MONO,
            sampleRate: Int = 16000,
            subscriber: (ByteString) -> Unit
    )

MainActivity.kt

builder.streamingConfig = StreamingRecognitionConfig.newBuilder()
        .setConfig(RecognitionConfig.newBuilder()
                .setLanguageCode("en-US")
                .setEncoding(RecognitionConfig.AudioEncoding.AMR_WB)
                .setSampleRateHertz(16000)
                .build())
        .setInterimResults(true)
        .setSingleUtterance(false)
        .build()
Edwinaedwine answered 6/10, 2018 at 3:22 Comment(5)
I think the problem may come from your sampleRate of AudioEmitter. Try to set it to 44100, 22050 or 11025 when the encoding type in streaming recognition is FLAC.Ungley
Maybe you can follow this official troubleshooting procedure? cloud.google.com/speech-to-text/docs/support#troubleshooting to define where the issue comes from.Sibilant
@Ungley I've messed with those settings, unfortunately it didn't help.Edwinaedwine
@Bsquare Looked at those many times. Have tried every possible combination of settings I can find, and still no luck. It looks like both here and on the cloud-speech-discuss forum the team is completely disengaged.Edwinaedwine
Did you try converting your sound file in FLAC or something else, just to check if it is a key in your issue?Sibilant
I
0

Google won't recognize your data because you tell it the data is in FLAC or AMR_WB format, while you keep passing raw, uncompressed audio chunks that AudioRecord.read() produces.

Now, in order to make it work you have two choices. The first is to convert the data to the required format yourself, possibly using some third-party library. The second one is to use MediaRecorder from the Android library. Unfortunately, it supports only writing to a file-like destination, so you cannot simply replace AudioRecorder with it, but there's a workaround described in this answer.

Ifni answered 6/11, 2018 at 3:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.