How to improve speech recognition in ios for numeric input?
Asked Answered
O

1

8

I am using ios speech recognition and it does very well whenever there is enough context. I am using speech recognition only for numeric input and am seeing issues. For example, with single digit numbers (like 2 - to, too, or 8 - ate); there is not enough context. Or even with some two digit numbers (80 is sometimes translated as idiot). I'd like to indicate to speech recognizer that the input is going to be number. For example, if the input is "number 2", then the speech recognizer does a fantastic job.

I have played around with some of the hints - SFSpeechRecognitionTaskHint (unspecified, dictation, search, confirmation) - but none of these modes are well suited for numeric input.

So, the questions are:

  • Is there a way to give a hint to SFSpeechRecognizer that the audio is going to be numeric? or
  • Is there another speech recognizer technology that might be better suited for my needs?

Note also that I'd like this to also work in many different languages (not just english).

Thanks for your help, Eric

Oakum answered 28/8, 2017 at 7:26 Comment(0)
S
2

There is nothing currently in Speech framework that would allow you to customize it for numbers only. If you're saying that prepending text with "number" word works better - you can try to record a voice file with "number" sound and append that file on the fly to whatever user is saying, so you get proper recognition. And cut word "number" from text you are receiving from Speech framework after recognition is complete. It sounds hacky, but I'm not sure there is other solution.

UPDATE

The other option would be to wait and analyze multiple variants that you will receive inside SFSpeechTranscriptionResult.transcriptions[] https://developer.apple.com/documentation/speech/sfspeechrecognitionresult/1648282-transcriptions

Wait until this array contains something that can be interpreted as a number and not accept first one available.

Speedway answered 28/8, 2017 at 17:24 Comment(1)
Thanks sha. I like the idea but have a couple of concerns. One, I'm not sure if I can mix and match pre-redorded audio with live audio. I suspect you can, but haven't seen that before. Two, and more importantly, I'd like this to work with many languages - so not sure how that would work, especially because in some languages (Mandarin for example) there's not the equivalent of "number ...". At least that's what my Mandarin friends tell me.Oakum

© 2022 - 2024 — McMap. All rights reserved.