- Basically, the best features to be used for speech/voice comparison are the MFCC.
There are some softwares that can be used to extract these coefficients: Praat website
You can also try to find a lib to extract these coefficients.
[Edit: I've found in tuneR documentation that it has a function to extract MFCC - search for the function melfcc()
]
- After you've extracted these features, you can use Machine Learning (SVM, RandomForests or something like that) to develop a classifier.
I have a seminar that I've presented about Speaker Recognition Systems, take a look at it, it may be helpful. (Seminar)
If you have time and interest, you could algo read:
Authors: Kinnunen, T., & Li, H. (2010)
Paper: an overview of text-independent speaker recognition: From features to supervectors
After you get a feature vector for each audio sample (with MFCC and/or other features), then you'll need to compare pairs of feature vectors (Features from A versus Features from B):
You could try to use the Absolute Difference between these feature vectors:
- abs(feature vector from A - feature vector from B)
The result of the operation above is a feature vector where every element is >=0 and it has the same size of the A (or B) feature vector.
You could also test the element-wise multiplication between A and B features:
- (A1*B1, A2*B2, ... , An*Bn)
Then you need to label each feature vector
(1 if person A == person B and 0 if person A != person B).
Usually the absolute difference performs better than the multiplication feature vector, but you can append both vectors and test the performance of the classifier using both the abs diff and the multiplication features at the same time.
seewave
that could be helpful for you. For your specific problem, spectrograms and amplitude normalization come to my mind - both of which can be easily done inseewave
. – Icebox