Continuous listen the user voice and detect end of speech silence in SpeechKit framework
Asked Answered
C

2

11

I have working an application where we need to open certain screen based on voice command like if user says "Open Setting" then it should open the setting screen, so far that I have used the SpeechKit framework but I am not able to detect the end of speech silence. Like how Siri does it. I want to detect if the user has ended his sentence/phrase.

Please find the below code for same where I have integrate the SpeechKit framework in two ways.

A) Via closure(recognitionTask(with request: SFSpeechRecognitionRequest, resultHandler: @escaping (SFSpeechRecognitionResult?, Error?) -> Swift.Void) -> SFSpeechRecognitionTask)

let audioEngine = AVAudioEngine()
let speechRecognizer = SFSpeechRecognizer()
let request = SFSpeechAudioBufferRecognitionRequest()
var recognitionTask: SFSpeechRecognitionTask?

func startRecording() throws {

        let node = audioEngine.inputNode
        let recordingFormat = node.outputFormat(forBus: 0)

        node.installTap(onBus: 0, bufferSize: 1024,
                        format: recordingFormat) { [unowned self]
                            (buffer, _) in
                            self.request.append(buffer)
        }

        audioEngine.prepare()
        try audioEngine.start()

        weak var weakSelf = self

        recognitionTask = speechRecognizer?.recognitionTask(with: request) {
            (result, error) in

            if result != nil {

                if let transcription = result?.bestTranscription {
                    weakSelf?.idenifyVoiceCommand(transcription)
                }
            }
        }            
}

But when I say any word/sentence like "Open Setting" then closure(recognitionTask(with:)) called multiple times and I have put the method(idenifyVoiceCommand) inside the closure which call multiple times, so how can I restrict to call only one time.

And I also review the Timer logic while googling it(SFSpeechRecognizer - detect end of utterance) but in my scenarion it does not work beacause I did not stop the audio engine as it continuously listening the user’s voice like Siri does.

B) Via delegate(SFSpeechRecognitionTaskDelegate)

speechRecognizer.recognitionTask(with: self.request, delegate: self)

func speechRecognitionTaskWasCancelled(_ task: SFSpeechRecognitionTask) {

}

func speechRecognitionTask(_ task: SFSpeechRecognitionTask, didFinishSuccessfully successfully: Bool) {

}

And I found that the delegate which handle when the end of speech occurs do not call it and accidentally call it after sometimes.

Caltrop answered 6/4, 2018 at 12:55 Comment(1)
really good question , in this problem fetch many time not solve any one know please give some solutionGegenschein
P
3

I had the same issue until now.

I checked your question and I suppose the code below helps you achieve the same thing I did:

recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, 
resultHandler: { (result, error) in

    var isFinal = false

    if result != nil {

        self.inputTextView.text = result?.bestTranscription.formattedString
        isFinal = (result?.isFinal)!
    }

    if let timer = self.detectionTimer, timer.isValid {
        if isFinal {
            self.inputTextView.text = ""
            self.textViewDidChange(self.inputTextView)
            self.detectionTimer?.invalidate()
        }
    } else {
        self.detectionTimer = Timer.scheduledTimer(withTimeInterval: 1.5, repeats: false, block: { (timer) in
            self.handleSend()
            isFinal = true
            timer.invalidate()
        })
    }

})

This checks if input wasn't received for 1.5 seconds

Petrol answered 15/7, 2019 at 19:22 Comment(2)
@muhammed essa - Thanks for sharing this code. What does the self.handleSend() call do?Septillion
@Septillion handleSend is my own function . This is here just for my requirement you can neglect it.Petrol
E
2

To your speech recogniser class add:

private var timer : Timer?

And modify code here:

recognitionTask = speechRecognizer.recognitionTask(with: request) { (result, error) in
        self.timer?.invalidate()
        self.timer = Timer.scheduledTimer(withTimeInterval: 1.5, repeats:false) { _ in
                       self.timer = nil
                       //do here what do you want to do, when detect pause more than 1.5 sec
                   }
        if result != nil {
Enwreathe answered 15/2, 2023 at 17:11 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.