Is it even possible in C [I know it is possible in general -GOM player does it]? just let me get started... What do you say?
How exactly do you identify human voice distinguished from other sounds?
Is it even possible in C [I know it is possible in general -GOM player does it]? just let me get started... What do you say?
How exactly do you identify human voice distinguished from other sounds?
Filters in mp3 players usually rely on the fact that the voice source (the performer) in a stereo recording studio is positioned at the center. So they just compute the difference between the channels. If you give them a recording where the performer is not positioned like that they fail - the voice is not extracted.
The reliable way is employing a voice detector. This is a very complex problem that involves hardcore math and thorough tuning of the algorithms for your specific task. if you go this way you start with reading on voice coding (vocoders).
This exact topic was discussed here. It started out as a discussion of audio coding technologies, but on the linked page above someone said
That means no way to extract voice form steoro signal?
But it was pointed out that extracting the voice should be no more difficult than eliminating the voice.
I'll let you read further, but I suspect successful extraction may rely on the relatively narrow spectral distribution of the voice compared to instruments.
Note that it is not possible in principle to perfectly separate different sounds which are mixed together in one track. It's like when you mix cream into your coffee - after it has been mixed in, it isn't possible to perfectly separate the cream and the coffee afterwards.
There might be smart signal processing tricks to get an acceptable result, but in general it's impossible to perfectly separate out the voice from the music.
Seperating the human voice from other sounds is no mean feat. If you have a recording of the other sounds then you can reference cancel the background sound which will leave you with the human voice.
If the background noise is random noise of some sort you will get a win by using some form of spectral filtering. But its not simple and would need a fair bit of playing with to get good results. Adobe Audition has an adaptive spectral filter i believe ...
Assume you have white noise with a fairly even frequency distribution across the entire recorded band (on a 44Khz uncompressed recording you are talking about 0 to 22Khz). Then add a voice on it. Obviously the voice is using the same frequencies as the noise. The human voice ranges from ~300Hz to ~3400Hz. Obviously bandpassing the audio will cut you down to only the voice range of 300 to 3400Hz. Now what? You have a voice AND you have the, now bandpassed, white noise. Somehow you need to be able to remove that noise and leave the voice in tact. There are various filtering schemes but all will damage the voice in the process.
Good luck, its really not gonna be simple!
Where buf has the pcm wav 44100 sample rate input data
int voiceremoval (char *buf, int bytes, int bps, int nch) { short int *samples = (short int *) buf; int numsamples = 0; int x = 0; numsamples = bytes / 2; x = numsamples; if (bps == 16) { short *a = samples; if (nch == 2) while (x--) { int l, r; l = a[1] - a[0]; r = a[0] - a[1];
if (l < -32768)
l = -32768;
if (l > 32767)
l = 32767; if (r 32767) r = 32767; a[0] = -l; a[1] = r; a += 2; } } return 0; }
© 2022 - 2024 — McMap. All rights reserved.