Clear input of SFSpeechAudioBufferRecognitionRequest after every result (Swift3)
Asked Answered
G

1

6

I have integrated speech to text through this appcoda tutorial. The problem I am facing is I want that user can write/edit something himself, but SFSpeechAudioBufferRecognitionRequest doesn't take into consideration the thing user has typed.

What's the way to send user typed input in SFSpeechAudioBufferRecognitionRequest or any way to clear SFSpeechAudioBufferRecognitionRequest input params before sending new request.

Thanks in advance.

Gushy answered 14/8, 2017 at 19:38 Comment(6)
I'm not clear what you are asking above, but it sounds like you may want the user to be able to speak to generate text, then edit or add to that text and then speak again. If so, I would handle that as two separate recognition requests, end the first one and append the results of the second request to the first request.Dermatitis
@DavidL : you understood the problem correct but the solution you provided is not so clear. how can I create multiple request, when to start the second recognition request.Gushy
The way I implemented it is putting a prompt on the screen when speech recognition starts and allowing the user to stop it when they are finished. You can then use the spoken text to add into your text field where the user could edit it or hit the speech recognition button again to talk and add more text.Dermatitis
To stop speech recognition, I used audioEngine.stop() and recognitionRequest?.endAudio() in the function that is called when the user stops recognition.Dermatitis
@DavidL : can't we do that in a single go without stopping and again starting the recordingGushy
See my answer below. You may be able to do it without stopping, but you would have to implement code to do that. You can look at the result before it is final, but the result from the speech recognition will not know about user edits so it will only reflect what the user has spoken. Watch the WWDC video I linked in the answer for more details. If I remember right, it is a fairly short video.Dermatitis
D
4

Here is what I use to create my recognition request:

func recordSpeech() throws {
    // Cancel the previous task if it's running.
    if let recognitionTask = recognitionTask {
        recognitionTask.cancel()
        self.recognitionTask = nil
    }

    isRecognizing = true
    self.delegate?.recognitionStarted(sender: self)

    let audioSession = AVAudioSession.sharedInstance()
    try audioSession.setCategory(AVAudioSessionCategoryRecord)
    try audioSession.setMode(AVAudioSessionModeMeasurement)
    try audioSession.setActive(true, with: .notifyOthersOnDeactivation)

    recognitionRequest = SFSpeechAudioBufferRecognitionRequest()

    guard let inputNode = audioEngine.inputNode else {
        print("there was an error in audioEngine.inputNode")
        fatalError("Audio engine has no input node")
    }

    guard let recognitionRequest = recognitionRequest else {
        fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object")
    }

    // Configure request so that results are returned before audio recording is finished
    recognitionRequest.shouldReportPartialResults = true

    // A recognition task represents a speech recognition session.
    // We keep a reference to the task so that it can be cancelled.
    recognitionTask = recognizer.recognitionTask(with: recognitionRequest) { result, error in

        func finalizeResult() {
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)
            self.recognitionRequest = nil
            self.recognitionTask = nil
        }

        guard error == nil else {
            finalizeResult()
            return
        }

        if !(result?.isFinal)! {

            guard self.isRecognizing else {
                return
            }

                // process partial result
                self.processRecognition(result: result)

            } else {
            finalizeResult()
        }          
    }

    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
        self.recognitionRequest?.append(buffer)
    }

    audioEngine.prepare()

    do {
        try audioEngine.start()
    } catch let error as NSError {
        print("audio engine start error=\(error)")
    }
}

To cancel or stop this at any point I use these methods:

@objc func stopRecording() {
    isRecognizing = false
    audioEngine.stop()
    recognitionRequest?.endAudio()
    self.delegate?.recognitionFinished()
}

func cancelRecording() {
    isRecognizing = false
    audioEngine.stop()
    recognitionTask?.cancel()
    self.delegate?.recognitionFinished()
}

I would setup a button to trigger speech recognition and tie it to recordSpeech(). Then setup a button and tie it to stopRecording(). When the user stops the request, result?.isfinal will be true and you know that is the final text from the first input. The user could then use speech input again for the second set of speech.

Most of my code came from the 2016 WWDC session on Speech Recognition which you can find here:

Transcript

Video

Dermatitis answered 15/8, 2017 at 19:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.