I have a .mp3 file. How can I seperate the human voice from the rest of the sound in C?

Asked 15/10, 2009 at 8:34 Answered 20/10, 2009 at 15:15

Is it even possible in C [I know it is possible in general -GOM player does it]? just let me get started... What do you say?

How exactly do you identify human voice distinguished from other sounds?

Blair answered 15/10, 2009 at 8:34 Comment(5)

Artificial intelligence is complex... – Wakerobin 15/10, 2009 at 8:36

If it's possible in any language, it's possible in C. This is basic computer science. – Hitlerism 15/10, 2009 at 8:40

you can start by filtering out the frequencies that are impossible to hear in a human voice – Titograd 15/10, 2009 at 8:48

@fortran: Those frequencies are also impossible to hear in music. For that reason, MP3 compression algorithms have already removed them, and you can safely ignore this when your input is in MP3 format. – Beaty 15/10, 2009 at 11:58

@Beaty - It's still a pretty valid idea. We can filter out pitches that are impossible for the human voice to make. – Noninterference 20/10, 2009 at 22:32

Filters in mp3 players usually rely on the fact that the voice source (the performer) in a stereo recording studio is positioned at the center. So they just compute the difference between the channels. If you give them a recording where the performer is not positioned like that they fail - the voice is not extracted.

The reliable way is employing a voice detector. This is a very complex problem that involves hardcore math and thorough tuning of the algorithms for your specific task. if you go this way you start with reading on voice coding (vocoders).

Valentino answered 15/10, 2009 at 8:37 Comment(0)

This exact topic was discussed here. It started out as a discussion of audio coding technologies, but on the linked page above someone said

That means no way to extract voice form steoro signal?

But it was pointed out that extracting the voice should be no more difficult than eliminating the voice.

I'll let you read further, but I suspect successful extraction may rely on the relatively narrow spectral distribution of the voice compared to instruments.

Moira answered 15/10, 2009 at 8:53 Comment(0)

Note that it is not possible in principle to perfectly separate different sounds which are mixed together in one track. It's like when you mix cream into your coffee - after it has been mixed in, it isn't possible to perfectly separate the cream and the coffee afterwards.

There might be smart signal processing tricks to get an acceptable result, but in general it's impossible to perfectly separate out the voice from the music.

Lewellen answered 15/10, 2009 at 8:54 Comment(0)

Seperating the human voice from other sounds is no mean feat. If you have a recording of the other sounds then you can reference cancel the background sound which will leave you with the human voice.

If the background noise is random noise of some sort you will get a win by using some form of spectral filtering. But its not simple and would need a fair bit of playing with to get good results. Adobe Audition has an adaptive spectral filter i believe ...

Assume you have white noise with a fairly even frequency distribution across the entire recorded band (on a 44Khz uncompressed recording you are talking about 0 to 22Khz). Then add a voice on it. Obviously the voice is using the same frequencies as the noise. The human voice ranges from ~300Hz to ~3400Hz. Obviously bandpassing the audio will cut you down to only the voice range of 300 to 3400Hz. Now what? You have a voice AND you have the, now bandpassed, white noise. Somehow you need to be able to remove that noise and leave the voice in tact. There are various filtering schemes but all will damage the voice in the process.

Good luck, its really not gonna be simple!

Tijuana answered 15/10, 2009 at 10:33 Comment(1)

Actually, after bandpass filtering white noise it's no longer white, by definition ("white" refers to the equal contribution of all frequencies, which in the case of visible light gives white light) – Beaty 15/10, 2009 at 12:0

Where buf has the pcm wav 44100 sample rate input data

int
voiceremoval (char *buf, int bytes, int bps, int nch)
{
    short int *samples = (short int *) buf;
    int numsamples = 0;
    int x = 0;
    numsamples = bytes / 2;
    x = numsamples;



    if (bps == 16)
      {
          short *a = samples;
          if (nch == 2)
              while (x--)
                {
                    int l, r;
                    l = a[1] - a[0];
                    r = a[0] - a[1];

                if (l < -32768)

                        l = -32768;

                if (l > 32767)

                        l = 32767;
                    if (r  32767)
                        r = 32767;
                    a[0] = -l;
                    a[1] = r;
                    a += 2;
                }
      }
    return 0;
}

Gambit answered 20/10, 2009 at 15:4 Comment(0)

Look up Independent Component Analysis (ICA)

Faltboat answered 20/10, 2009 at 15:15 Comment(0)

Recommended topics

Hot tags