Connect to Microsoft's Cognitive Speaker Recognition API via Xamarin.Android
Asked Answered
C

1

7

I was building a test application to authenticate users via Microsoft's Cognitive Speaker Recognition API. It seems straightforward, but as mentioned in their API Docs, while creating the Enrollment, I need to send the byte[] of the audio file I record. Now, since I am using Xamarin.Android, I was able to record the audio and save it. Now, the requirements of THAT audio is pretty specific by Microsoft's Cognitive Speaker Recognition API.

According to the API docs, the audio file format must meet the following requirements.

Container -> WAV
Encoding -> PCM
Rate -> 16K
Sample Format -> 16 bit
Channels -> Mono

Following this recipe I successfully recorded the audio and after playing around a little and with some android docs, I was able to implement these settings as well :

_recorder.SetOutputFormat(OutputFormat.ThreeGpp);

_recorder.SetAudioChannels(1);
_recorder.SetAudioSamplingRate(16);
_recorder.SetAudioEncodingBitRate(16000);

_recorder.SetAudioEncoder((AudioEncoder) Encoding.Pcm16bit);

This meets most of the criteria of the required audio file. But, I cannot seem to save the file in actual ".wav" format and I cannot verify whether the file is actually being PCM encoded or not.

Here's my AXML and MainActivity.cs : Github Gist

I also followed this code and incorporated it in my code : Github Gist

The file's specs look just fine, but the duration is wrong. No matter how long I record, it just shows 250ms, which results in too-short audio.

Is there any way to do this? Basically I just want to be able to connect to Microsoft's Cognitive Speaker Recognition API via Xamarin.Android. I couldn't find any such resource to guide myself.

Copper answered 15/3, 2018 at 8:32 Comment(2)
Have you tried the github.com/NateRickard/Xamarin.Cognitive.BingSpeech and github.com/NateRickard/Plugin.AudioRecorder plugins?Dorsiventral
Yep, tried both of them and the latter one won't work. I tried it and it won't save the audio file (running the sample as it is) and later it crashes on playing the audio file back.Copper
P
6

Audio Recording

Add the Audio Recorder Plugin NuGet Package to the Android Project (and to any PCL, netstandard, or iOS libraries if you are using them).

Android Project Configuration

  1. In AndroidMainifest.xml, add the following permissions:
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.INTERNET" />
  1. In AndroidManifest.xml, add the following provider inside the <application></application> tag.
<provider android:name="android.support.v4.content.FileProvider" android:authorities="${applicationId}.fileprovider" android:exported="false" android:grantUriPermissions="true">
    <meta-data android:name="android.support.FILE_PROVIDER_PATHS" android:resource="@xml/file_paths"></meta-data>
</provider>

enter image description here

  1. In the Resources folder, create a new folder called xml

  2. Inside of Resources/xml, create a new file called file_paths.xml

enter image description here

  1. In file_paths.xml, add the following code, replacing [your package name] with the package of your Android project
<?xml version="1.0" encoding="utf-8"?>
<paths xmlns:android="http://schemas.android.com/apk/res/android">
    <external-path name="my_images" path="Android/data/[your package name]/files/Pictures"/>
    <external-path name="my_movies" path="Android/data/[your package name]/files/Movies" />
</paths>

Example Package Name

enter image description here

Android Recorder Code

AudioRecorderService AudioRecorder { get; } = new AudioRecorderService
{
    StopRecordingOnSilence = true,
    PreferredSampleRate = 16000
});

public async Task StartRecording()
{
    AudioRecorder.AudioInputReceived += HandleAudioInputReceived;
    await AudioRecorder.StartRecording();
}

public async Task StopRecording()
{
    AudioRecorder.AudioInputReceived += HandleAudioInputReceived;
    await AudioRecorder.StartRecording();
}

async void HandleAudioInputReceived(object sender, string e)
{
    AudioRecorder.AudioInputReceived -= HandleAudioInputReceived;

    PlaybackRecording();

    //replace [UserGuid] with your unique Guid
    await EnrollSpeaker(AudioRecorder.GetAudioFileStream(), [UserGuid]);
}

Cognitive Services Speaker Recognition Code

HttpClient Client { get; } = CreateHttpClient(TimeSpan.FromSeconds(10));

public static async Task<EnrollmentStatus?> EnrollSpeaker(Stream audioStream, Guid userGuid)
{
    Enrollment response = null;
    try
    {
        var boundryString = "Upload----" + DateTime.Now.ToString("u").Replace(" ", "");
        var content = new MultipartFormDataContent(boundryString)
        {
            { new StreamContent(audioStream), "enrollmentData", userGuid.ToString("D") + "_" + DateTime.Now.ToString("u") }
        };

        var requestUrl = "https://westus.api.cognitive.microsoft.com/spid/v1.0/verificationProfiles" + "/" + userGuid.ToString("D") + "/enroll";
        var result = await Client.PostAsync(requestUrl, content).ConfigureAwait(false);
        string resultStr = await result.Content.ReadAsStringAsync().ConfigureAwait(false);

        if (result.StatusCode == HttpStatusCode.OK)
            response = JsonConvert.DeserializeObject<Enrollment>(resultStr);

        return response?.EnrollmentStatus;
    }
    catch (Exception)
    {

    }

    return response?.EnrollmentStatus;
}

static HttpClient CreateHttpClient(TimeSpan timeout)
{
    HttpClient client = new HttpClient();

    client.Timeout = timeout;
    client.DefaultRequestHeaders.AcceptEncoding.Add(new StringWithQualityHeaderValue("gzip"));
    client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));

    //replace [Your Speaker Recognition API Key] with your Speaker Recognition API Key from the Azure Portal
    client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", [Your Speaker Recognition API Key]);

    return client;
}

public class Enrollment : EnrollmentBase
{
    [JsonConverter(typeof(StringEnumConverter))]
    public EnrollmentStatus EnrollmentStatus { get; set; }
    public int RemainingEnrollments { get; set; }
    public int EnrollmentsCount { get; set; }
    public string Phrase { get; set; }
}

public enum EnrollmentStatus
{
    Enrolling
    Training,
    Enrolled
}

Audio Playback

Configuration

Add the SimpleAudioPlayer Plugin NuGet Package to the Android Project (and to any PCL, netstandard, or iOS libraries if you are using them).

Code

public void PlaybackRecording()
{
    var isAudioLoaded = Plugin.SimpleAudioPlayer.CrossSimpleAudioPlayer.Current.Load(AudioRecorder.GetAudioFileStream());

    if (isAudioLoaded)
        Plugin.SimpleAudioPlayer.CrossSimpleAudioPlayer.Current.Play();
}
Psephology answered 23/3, 2018 at 2:58 Comment(4)
Thank you for the detailed guide. I followed this and got some errors, which I fixed. However, there's an error I can't seem to fix. As soon as I stop the recording, I get Java.IO.IOException: Prepare failed.: status=0x1. I looked around and inspected my phone and there was no audio file being saved that the player could read from. Hence, this error. Seems like there's no way to save the recorded audio offline.Copper
What is the make/model of your phone, or are you using an emulator? And what version of Android is the device using?Psephology
I'm on Xiaomi MI A1, stock Android 8.0.0. I'm not using an emulator.Copper
I double checked and both the permissions, to record audio and storage have been given to the application properly.Copper

© 2022 - 2024 — McMap. All rights reserved.