How to Recognise when user START & STOP speaking in android? (Voice Recognition in Android)

Asked 20/3, 2012 at 14:23 Answered 22/4, 2019 at 11:29

android voice voice-recognition voice-recording

I have done a lot of R&D and gone through a lot of resources to resolve my problem but I have FAILED to get any proper solution.

I have developed an app, now i want to add Voice based functionality to it.

The required features are

1) when USER starts speaking, it should record the audio/video and

2) when user stops speaking, it should play the recorded audio/video .

Note:Here video means whatever user performs within app during that period of time. For example, clicks on the buttons or some kind of animation, etc.

I don't want to use Google's Voice Recognizer available by default in the Android as it requires Internet but my app runs offline.Also, I came to know of CMU-Sphinx. But it is not helpful as per my requirements.

EDITED :- Also,I would like to add that i have achieved this using Start & Stop button but I don't want to use these buttons.

If anyone has any idea or any suggestions, please let me know.

Rhodes answered 20/3, 2012 at 14:23 Comment(3)

did you get a solution for this? – Tomika 22/8, 2016 at 14:15

check this link solution – Tomika 22/8, 2016 at 14:21

Have you got the solution? – Hebetic 26/1, 2018 at 11:6

The simplest and most common method is to count the number of zero crossings in the audio (ie when the sign changes from positive to negative).

If that value is too high then the sound is unlikely to be speech. If it is too low then, again, it is unlikely to be speech.

Combine that with a simple energy level (how loud the audio is) and you have a solution which is pretty robust.

If you need a more accurate system then it gets much much more complex. One way is to extract audio features (MFCCs for example) from "training data", model them up with something like a GMM and then test the features you extract from live audio against the GMM. This way you can model the likelihood that a given frame of audio is speech over non-speech. This is not a simple process however.

I'd strongly recommend going down the lines of zero-crossings as it is simple to implement and works fine 99% of the time :)

Unfrequented answered 21/4, 2012 at 15:8 Comment(6)

I am really happy to know the correct answer from a professional like you.... but i dont know much about this sound recognition.. and much of the information i gave in my answer is from here.. – Electrodynamics 21/4, 2012 at 15:19

@Raju: Fair enough ... TBH doing feature modelling is INCREDIBLY complicated. I'm using it for speaker recognition (ie recognising a given person speaking) at the moment. Its not really something that can be gone into simply, alas :( The zero-crossings method I describe above really is very simple and works beautifully. While I know modelling general speech and trying to match likelihoods to that is a much better solution I would generally simply implement zero crossings to identify a speaker ... its so easy to implement and works so well as to not be worth worrying about too much :D – Unfrequented 21/4, 2012 at 17:6

Hmm i realise i said "to identify a speaker" ... I meant "to identify someone is speaking" ... – Unfrequented 21/4, 2012 at 18:24

Yes... i understood that part... :) – Electrodynamics 21/4, 2012 at 18:26

@Raju: Just had to clarify ;) – Unfrequented 21/4, 2012 at 18:27

@Goz, Thanks for your kind response. I got ur idea but can you please provide me some reference for it. So i can start actual development; It would be very great and helpful. – Rhodes 22/4, 2012 at 15:3

You can try adding listeners to the application events like navigation , clicking the animation etc... in listeners implementation you can trigger the start/stop functionalities...

http://tseng-blog.nge-web.net/blog/2009/02/14/implementing-listeners-in-your-android-java-application/

look at these examples... this might be helpful to you....

but i m wondering that what you described about your application behavior looks like you gonna reinvent like talking tom huh ??? :-P

Aeneus answered 21/3, 2012 at 3:56 Comment(8)

Thanks for your reply. But for Voice Recognition is there any Listner available? And i want the voice functionality like Talking Tom.... – Rhodes 21/3, 2012 at 6:12

assume you have a method called "StartRec()". you have to call this method from the actionlister declaration like this 'mainScreen.addListener( new ClickListner(){ startRec();});' i m not sure about the listner details you can find it in android dev site. – Aeneus 21/3, 2012 at 7:12

Yeah that all thing i know. But there should be some way for recognition of voice? How can app know that user start speaking? I want to know that...... – Rhodes 21/3, 2012 at 7:15

sorry dude i m unable to understand what you are trying to do... even in talking tom your voice will be recoreded only when tom do action like hands in its ear(like i m hearing you ) – Aeneus 21/3, 2012 at 7:17

No in talking tom whenever user start speaking then that action hands in its ear trigger. So my question is that how can you know that user start speaking something or stop speaking programmatically? – Rhodes 21/3, 2012 at 8:35

@Rhodes I have a feeling that they are constantly recording and monitoring volume levels. When the volume reaches a certain point they perform actions. From what I can understand from you, that is not exactly what you want to accomplish. As far as I can tell always listening in the background for a user voice command would be a very costly (battery, processor) venture. That is probably why neither Google nor Apple have implemented that ability yet. They both use buttons to begin listening for voice commands. – Jenny 15/4, 2012 at 6:40

@MikeIsrael, Thanks for your response. But then How app like Talking tom is made? Its available in both Apple and Android? There should be something which i am missing... – Rhodes 15/4, 2012 at 7:11

And as far as i know in iPhone the COCOA framework is providing these functionality but in Android I didn't find anything. – Rhodes 15/4, 2012 at 7:11

below is the code I use for an iPhone application that does exactly the same thing. The code is in Objective-C++ but I have lots of comments in it. This code is executed inside the callback function of a recording queue. I am sure that a similar approach exists for the Android platform.

This approach works very nice in almost every acoustic environment I have used it and it is used in our app. You can download it to test it if you want.

Try implementing it in the android platform and you are done!

// If there are some audio samples in the audio buffer of the recording queue
if (inNumPackets > 0) {
        // The following 4 lines of code are vector functions that compute 
        // the average power of the current audio samples. 
        // Go [here][2] to view documentation about them. 
        vDSP_vflt16((SInt16*)inBuffer->mAudioData, 1, aqr->currentFrameSamplesArray, 1, inNumPackets);
        vDSP_vabs(aqr->currentFrameSamplesArray, 1, aqr->currentFrameSamplesArray, 1, inNumPackets);
        vDSP_vsmul(aqr->currentFrameSamplesArray, 1, &aqr->divider, aqr->currentFrameSamplesArray, 1, inNumPackets);
        vDSP_sve(aqr->currentFrameSamplesArray, 1, &aqr->instantPower, inNumPackets);
        // InstantPower holds the energy for the current audio samples
        aqr->instantPower /= (CGFloat)inNumPackets;
        // S.O.S. Avoid +-infs, NaNs add a small number to InstantPower
        aqr->instantPower = log10f(aqr->instantPower + 0.001f);
        // InstantAvgPower holds the energy for a bigger window 
        // of time than InstantPower
        aqr->instantAvgPower = aqr->instantAvgPower * 0.95f + 0.05f * aqr->instantPower;
        // AvgPower holds the energy for an even bigger window 
        // of time than InstantAvgPower
        aqr->avgPower = aqr->avgPower * 0.97f + 0.03f * aqr->instantAvgPower;
        // This is the ratio that tells us when to record
        CGFloat ratio = aqr->avgPower / aqr->instantPower;
        // If we are not already writing to an audio file and 
        // the ratio is bigger than a specific hardcoded value 
        // (this value has to do with the quality of the microphone 
        // of the device. I have set it to 1.5 for an iPhone) then start writing!
        if (!aqr->writeToFile && ratio > aqr->recordingThreshold) {
            aqr->writeToFile = YES;
        } 
        if (aqr->writeToFile) {
            // write packets to file
            XThrowIfError(AudioFileWritePackets(aqr->mRecordFile, FALSE, inBuffer->mAudioDataByteSize,
                                                inPacketDesc, aqr->mRecordPacket, &inNumPackets, inBuffer->mAudioData),
                          "AudioFileWritePackets failed");
            aqr->mRecordPacket += inNumPackets;
            // Now if we are recording but the instantAvgPower is lower 
            // than avgPower then we increase the countToStopRecording counter
            if (aqr->instantAvgPower < aqr->avgPower) {
                aqr->countToStopRecording++;
            } 
            // or else set him to 0.
            else {
                aqr->countToStopRecording = 0;
            }
            // If we have detected that there is not enough power in 30 consecutive
            // audio sample buffers OR we have recorded TOO much audio 
            // (the user speaks for more than a threshold of time) stop recording 
            if (aqr->countToStopRecording > 30 || aqr->mRecordPacket > kMaxAudioPacketsDuration) {
                aqr->countToStopRecording = 0;
                aqr->writeToFile = NO;
                // Notify the audio player that we finished recording 
                // and start playing the audio!!!
                dispatch_async(dispatch_get_main_queue(), ^{[[NSNotificationCenter defaultCenter] postNotificationName:@"RecordingEndedPlayNow" object:nil];});
            }
        }
    }

Best!

Vaasa answered 21/4, 2012 at 14:40 Comment(3)

Am i missing something or is this, essentially, just a very complicated amplitude detector? (There are much easier ways to get the current amplitude on iPhone!) – Unfrequented 21/4, 2012 at 15:9

@Unfrequented i would love to tell me more about it, code, samples, docs – Vaasa 21/4, 2012 at 15:35

AudioQueueGetProperty( mAqr, kAudioQueueProperty_CurrentLevelMeterDB, &aqlms, &size ); – Unfrequented 21/4, 2012 at 17:6

Here is the simple code which detect user stop speaking. I am checking below value

recorder.getMaxAmplitude();

sample code:

public void startRecording() throws IOException {

    Thread thread = new Thread() {
        @Override
        public void run() {
            int i = 0;
            while (i == 0) {

                try {
                    sleep(100);

                    if (recorder != null) {

                        checkValue(recorder.getMaxAmplitude());

                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    };
    thread.start();


}

checkValue function:

public void checkValue(int amplitude) {


    try{

        if (amplitude > 1000) {
            Log.d("I", "Amplitude : " + amplitude);
            amplitude = recorder.getMaxAmplitude();
            Thread.sleep(2000);
            isListened=true;
        }else if(isListened) {
            Log.d("I","Stop me");
            recordingDialog.dismiss();
        }

    }catch (Exception e){
        e.printStackTrace();
    }


}

I know this question is very old and previously answered but this small code snippet might help someone else.

Awn answered 22/4, 2019 at 11:29 Comment(0)

Recommended topics

Hot tags