Compare voice wav in android or voice tag ( voice commands ) API

About

Asked 8/2, 2011 at 16:39 Answered 10/11, 2011 at 11:33

android wav speech-recognition voice-recognition wave

I'm developing an app and I need some way to compare 2 voices if they' match or not, I know that Voice Recognizer is a way to do that but since (i think) it needs to translate the voice into string first, it won't be so suitable for other language apart from the lang supported by the speech recognizer....any idea? Just like old-day phone used to do, the voice tag where it just compare the voice input with the voice it recorded earlier during the setup

Untie answered 8/2, 2011 at 16:39 Comment(2)

Are you trying to recognize that both recordings were the same speaker (voice authentication or speaker identification) or are you trying to determine that the same words were spoken? What are you matching? the identity of the speaker or the words he spoke? – Parturient 8/2, 2011 at 19:12

Hi. how you solve this ? – Kathlyn 2/5, 2014 at 6:59

A relatively simple way to do this is to use FFT (Fast Fourier Transform) to convert the time-domain data of the original WAV file into frequency-domain data (in which each value in your transformed array represents the relative magnitude/intensity of a particular frequency band).

If the same person speaks the same word twice, the resulting time-domain data will nevertheless still be very different numerically in the two WAV files. Converting both WAV files to the frequency domain (using the same size of FFT window for both, even if the two files are of slightly different lengths) will produce frequency arrays that are much more similar to each other than were the original WAV files.

Unfortunately, I haven't been able to find any FFT libraries specifically for Android. Here's a question that references some Java-based libraries:

Signal processing library in Java?

Launcher answered 8/2, 2011 at 16:54 Comment(7)

Thank you, but I'm really surprised, why my old moto phone can do it and this new stupid android can't , omg i just want to record my voice then say it again and it will know if it's near the original! I don't want to do this complicated thing, i mean if they will be exactly the same no problem, but there is many algorithms to decide if it's near the original or not, i don't want to reinvent the wheel! – Untie 8/2, 2011 at 17:20

Yeah, it looks like the speech recognition stuff baked into Android does not work like what you want. I think manufacturers have actually been moving away from how your moto did speech recognition, since that older way was dependent upon "training" your phone to recognize your particular voice. This is what you happen to want, of course, but it was considered a serious weakness and is presumably why nobody does it like that anymore. – Launcher 8/2, 2011 at 19:38

MusiGenesis is right. The old way of doing voice recognition was the phone had all the libraries and processing software on the device itself. Androids open a stream to Google and you are actually 'talking' to the Google servers, not your phone. The servers then send the interpretation back to the phone. It's more accurate, supports more features, and saves space on the phone this way. But, it's detrimental to what you're trying to do, which is unfortunate. :( – Busey 8/2, 2011 at 19:49

What's worse is that when android can't contact the server, the home screen voice search makes you record a new attempt, instead of letting you re-try with the existing recording (or even letting you set it to keep doing so automatically). – Monda 8/2, 2011 at 20:6

do you have a reference(preferably a published survey paper) describing useful features for voice recognition/classification? – Matriarchate 29/5, 2013 at 18:24

@Abhishek: I'm not sure what you're asking for, exactly. Voice recognition and speech-to-text are enormously varied topics. – Launcher 29/5, 2013 at 19:2

@Launcher I like your idea of turning voice into discrete FFT values. In general, to do machine leaning(recognition/classification), we convert an input(eg: voice) to a bunch of numbers(called features in machine-learning). Then we try to come up with a mathematical function which maps those features to the desired output(eg: text). Your answer says that FFT is a good feature for voice data. I was curious what are other good features for machine learning on voice data – Matriarchate 29/5, 2013 at 23:4

An idea is comparing the similarity of the voices in their spectograms. The features in spectrogram is robust and resist to noise which is a good reference for analysing two voice. If you take this approach you should find out the features of the voices first and than you need to know how to compare the features in two spectrograms, it refers to pattern recognition.

This api http://code.google.com/p/musicg-sound-api/ is written in java and can be used in android. It captures the wave spectrogram.

Stormystorting answered 10/11, 2011 at 11:33 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags