Compare Two Audio(locally stored pre-recorded voice command and recorded from microphone in app) in iOS

Asked 27/7, 2016 at 19:24 Answered 5/8, 2016 at 12:39

ios objective-c swift speech-recognition audio-fingerprinting

In-app, I have to compare live recording from previously locally stored voice command if it matches(not only text but also identified person's voice) then perform necessary action.

1-match voice commands from the same person.

2-match command 's text.

I applied many ways but none are working as per my expectation.

First: use Speech to text Library like OpenEars,SpeechKit but these libraries convert only text from speech.

Result: Failed As My expectation

Second:(Audio Finger printing)

acrcloud Library : in this library, I record a command and stored that mp3file on acrcloud server and match with live recording(spoken by me) it doesn't match but when I play the same recording(recorded MP3 file of my voice ) which is uploaded to the acrcloud server then it matches. Result: Failed As My expectation

API.AI : in this library,it is like speech to text ,I stored some text command on his server and then anyone speaks the same command the result get success. Result: Failed As My expectation

Please Suggest me how to solve this problem for iOS Application

Alodie answered 27/7, 2016 at 19:24 Comment(7)

@gnasher729 having fun from last week, according to your comment it is possible. – Alodie 27/7, 2016 at 19:31

if acrcloud Library is failing then you shud raise a ticket in their website. – Marcellmarcella 27/7, 2016 at 19:36

@TejaNandamuri acrcloud library not failing but my expectation is different from that to match person voice along with content. – Alodie 27/7, 2016 at 19:38

if you find more appropriate answer please share. – Rich 29/7, 2016 at 14:22

@Rich if u find any solution then tell me – Alodie 29/7, 2016 at 19:24

@amit did you try this link - github.com/lbrndnr/LBAudioDetective – Rich 29/7, 2016 at 19:25

This question is too broad and a borderline tool request. The only reason it hasn't been closed yet is because of the bounty attached. – Eleph 4/8, 2016 at 14:2

This is how I would approach this If I understand ur requirements correctly:

You will need to compare the audio spectrum of each recording to match the person (Look at the vDSP in Accelerate framework) An FFT analysis with 1024 window should be enough (if not try doubling it for more detail) I would start the comparison with 5-10 peaks in the spectrum and experiment from there. check out EZAudio for an easy FFT implementation to get you started.
Use a speech to text library to match the text. Speech accents usually distort their results considerably so I would probably start by trying getting the text from both audio and comparing instead of specifying a command in text to match.

Good luck!

Bidden answered 5/8, 2016 at 8:6 Comment(2)

i have already calculate frequency of Audio through FFT(vDSP in Accelerate framework) and please explain the process of calculating audio spectrum and then how to compare these spectrums of audios for matching human vocie.this is main challenge how we write code for that. – Alodie 5/8, 2016 at 9:9

A. Levy has a very nice answer here to get you started: #604953 – Bidden 5/8, 2016 at 16:37

http://www.politepix.com/openears/ can be used in objective-c or if you want swift try http://blog.tryolabs.com/2015/06/15/tlsphinx-automatic-speech-recognition-asr-in-swift/. I never used them but they seem to have everything you need. If not try looking for C++ libraries, there should be more options but most probably you'll have to deal with typical porting issues. I really don't recommend you writing one yourself as you'll spend some time learning techiques to then import some signal processing library to then start writing your own algorithm. Except of course if you have the time and interest to do it.

I'd recommend you start integrating your app in the same fashion as voice recognition SW are usually developed: record a bunch of examples, build tests and verify often if things are on/off track.

One of the more important things I learned when doing voice recognition work (both for word recognition and speaker recognition) was that the quality of the recording has a big impact on what you're able to do with it. Make a small batch of recordings in the quietest place you can find, then you'll always have a benchmark to compare to more real life recordings.

Also try to cover all the microphones you'll find in real applications at a later stage, as there's no intrinsic guarantees that all iphone microphones are created equal. I'd expect it to not vary at all across different iphone models but who knows?

Humiliation answered 5/8, 2016 at 12:39 Comment(3)

this is library is only speech to text not comparing human voice like voice Authentication. – Alodie 5/8, 2016 at 13:15

My mistake there. BTW you don't mention if you can offload the heavy work to a server, which would greatly open your possiblities to cpu intensive algorithms and/or other languages. I'm surprised by the amount of paid solutions and very few open source ones. Here's one example of a paid technology (freemium) for being built server side, but apparently you only need to make REST calls so it's easy to integrate with iOS: microsoft.com/cognitive-services/en-us/speaker-recognition-api. Microsoft also offers a speech-to-text SDK that apparently can work offline. – Humiliation 5/8, 2016 at 13:48

Any luck with Microsoft api? – Humiliation 9/8, 2016 at 13:56

-2

In general, I think you should use method 1 with some tweak. For local audio. You add text script version like: 1 audio, source script For record audio. Use OpenEars,SpeechKit for convert audio to text

Try to compare source script and text for get result. You should mark what text must be stress in source script for best compare result. Sometimes we have word like: wine, wife, white... (try handle this think too)

GLHF

Collette answered 4/8, 2016 at 9:13 Comment(0)

Recommended topics

Hot tags