I'm looking for a way to match a known data set, let's say a list of MP3s or wav files, each which is a sample of someone speaking. At this point I know file ABC is of Person X speaking.
I would then like to take another sample, and do some voice matching to show who this voice is most likely of, given then known data set.
Also, I don't necessarily care what the person has said, as long as I can find a match, i.e I don't need any transcribing or otherwise.
I'm aware CMU Sphinx doesn't do voice recognition, and it's primarily used for voice-to-text, but I have seen other systems, eg: the LIUM Speaker Diarization (http://cmusphinx.sourceforge.net/wiki/speakerdiarization) or the VoiceID project (https://code.google.com/p/voiceid/) which uses CMU as a base for this type of work.
If I am to use CMU, how can I do voice matching?
Also, if CMU Sphinx isn't the best framework, is there an alternate that's open source?