Implementing "user stopped speaking" notification for `SFSpeechRecognizer`
Asked Answered
O

1

4

I'm attempting to solve this problem: SFSpeechRecognizer - detect end of utterance

The problem is that SFSpeechRecognizer callback fires every time the detected speech string changes, but it only fires after 60 seconds of silence (whereupon it sets the isFinal flag).

The suggested technique is to start a 2 second timer each time to callback fires, first invalidating the timer if it is already set.

I have implemented this technique. However at my timer callback is never getting hit.

Can anyone tell me why?

import Foundation
import Speech

@objc
public class Dictation : NSObject, SFSpeechRecognizerDelegate
{
    @objc static let notification_finalText = Notification.Name("speech_gotFinalText")
    @objc static let notification_interimText = Notification.Name("speech_textDidChange")

    private let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-UK"))!

    var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?

    private var recognitionTask: SFSpeechRecognitionTask?

    let audioEngine = AVAudioEngine()

    @objc var text_tmp   : String? = ""
    @objc var text_final : String? = ""

    var timer : Timer?

    override init()
    {
        super.init()

        speechRecognizer.delegate = self

        SFSpeechRecognizer.requestAuthorization { authStatus in
            if authStatus != .authorized {
                exit(0)
            }
        }
    }

    // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    @objc
    func tryStartRecording()
    {
        try! startRecording()
    }

    // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    func startRecording() throws
    {
        text_final = ""

        // Cancel the previous task if it's running.
        if let recognitionTask = recognitionTask {
            recognitionTask.cancel()
            self.recognitionTask = nil
        }

        recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

        let inputNode = audioEngine.inputNode
        /*
         ^ causes:
         [plugin] AddInstanceForFactory: No factory registered for id <CFUUID 0x600000247200> F8BB1C28-BAE8-11D6-9C31-00039315CD46
         HALC_ShellDriverPlugIn::Open: Can't get a pointer to the Open routine
         HALC_ShellDriverPlugIn::Open: Can't get a pointer to the Open routine
         */

        if inputNode.inputFormat(forBus: 0).sampleRate == 0 {
            fatalError("Audio engine has no input node")
        }

        guard let recognitionRequest = recognitionRequest else {
            fatalError("Unable to created a SFSpeechAudioBufferRecognitionRequest object")
        }

        // Configure request so that results are returned before audio recording is finished
        recognitionRequest.shouldReportPartialResults = true

        // A recognition task represents a speech recognition session.
        // We keep a reference to the task so that it can be cancelled.
        recognitionTask = speechRecognizer.recognitionTask( with: recognitionRequest )
        { result, error in
            self.timer?.invalidate()
            print( "New Timer" )
            self.timer = Timer(timeInterval:2.0, repeats:false) { _ in

                print( "*** Timer Callback -- NEVER HITS! ***" )

                self.timer?.invalidate()
                self.text_final = result!.bestTranscription.formattedString

                NotificationCenter.default.post( name: Dictation.notification_finalText,  object: nil )

                self.stopRecording()
            }

            var isFinal = false

            if let result = result {
                isFinal = result.isFinal

                if isFinal {
                    self.text_final = result.bestTranscription.formattedString
                } else {
                    self.text_tmp = result.bestTranscription.formattedString
                }

                let notification = isFinal ? Dictation.notification_finalText : Dictation.notification_interimText

                NotificationCenter.default.post( name: notification,  object: nil )
            }

            if error != nil  ||  isFinal {
                self.audioEngine.stop()
                inputNode.removeTap( onBus: 0 )

                self.recognitionRequest = nil
                self.recognitionTask = nil
            }
        }

        let recordingFormat = inputNode.outputFormat(forBus: 0)

        inputNode.installTap( onBus: 0,  bufferSize: 1024,  format: recordingFormat )
        { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
            self.recognitionRequest?.append( buffer )
        }

        audioEngine.prepare()

        try audioEngine.start()

        print( self.audioEngine.description )
    }

    // - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

    @objc
    func stopRecording()
    {
        audioEngine.stop()
        recognitionRequest?.endAudio()
    }
}

LINKS:
- SFSpeechRecognizer - detect end of utterance

Onomatopoeia answered 22/7, 2019 at 14:51 Comment(0)
S
3

It's because you create the timer but you never start it:

self.timer = Timer(timeInterval:2.0, repeats:false)

Instead, say

self.timer = Timer.scheduledTimer( ...
Strophe answered 22/7, 2019 at 15:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.