Algorithm to remove vocal from sound track [closed]

Asked 9/9, 2010 at 0:55 Answered 10/3, 2017 at 22:23

Solved algorithm audio mp3 signal-processing voice

I want to remove vocals from mp3 sound tracks. I searched google, and tried few softwares but none of them are convincing. I am planning to read the mp3 file, get a waveform and remove the waveform that is above a specified limit.

do you have any suggestions on how to proceed.

-- Update

I just want code that can read mp3 file format. Are there any softwares??

Mesozoic answered 9/9, 2010 at 0:55 Comment(3)

This would be pretty cool... what softwares have you already tried? – Temptress 9/9, 2010 at 0:58

audacity, wavosaur and extra boy pro – Mesozoic 9/9, 2010 at 1:5

librosa does vocal separation. – Manyplies 17/9, 2018 at 1:55

This isn't so much an "algorithm" as a "trick" but it could be automated in code. It works mostly for stereo tracks with where the vocals are centered. If the vocals are centered, they manifest equally in both tracks. If you invert one of the tracks and then merge them back together, the wave forms of the center vocals cancel out and are virtually removed. You can do this manually with most good audio editors like audacity. It doesn't give you perfect results and the rest of the audio suffers a bit too but it makes for great karaoke tracks :)

Costa answered 9/9, 2010 at 1:5 Comment(2)

It's called phase cancellation, the major drawback is that the produced track is mono. – Burleson 9/9, 2010 at 1:11

>"the rest of the audio suffers a bit too" — this lucky scenario is rare. the most probable case is that there's little sound left and it also sounds very wrong. however, usually something better can be done if one has more-than-stereo source (5.1, etc). but it's also not as simple – Access 13/3, 2017 at 15:5

Source: http://www.cdf.utoronto.ca/~csc209h/summer/a2/a2.html, written by Daniel Zingaro.

Sounds are waves of air pressure. When a sound is generated, a sound wave consisting of compressions (increases in pressure) and rarefactions (decreases in pressure) moves through the air. This is similar to what happens if you throw a stone into a pond: the water rises and falls in a repeating wave.

When a microphone records sound, it takes a measure of the air pressure and returns it as a value. These values are called samples and can be positive or negative corresponding to increases or decreases in air pressure. Each time the air pressure is recorded, we are sampling the sound. Each sample records the sound at an instant in time; the faster we sample, the more accurate is our representation of the sound. The sampling rate refers to how many times per second we sample the sound. For example, CD-quality sound uses a sampling rate of 44100 samples per second; sampling someone's voice for use in a VOIP conversation uses far less than this. Sampling rates of 11025 (voice quality), 22050, and 44100 (CD quality) are common...

For mono sounds (those with one sound channel), a sample is simply a positive or negative integer that represents the amount of compression in the air at the point the sample was taken. For stereo sounds (which we use in this assignment), a sample is actually made up of two integer values: one for the left speaker and one for the right...

Here's how the algorithm [to remove vocals] works.

Copy the first 44 bytes verbatim from the input file to the output file. Those 44 bytes contain important header information that should not be modified.

Next, treat the rest of the input file as a sequence of shorts. Take each pair of shorts left and right, and compute combined = (left - right) / 2. Write two copies of combined to the output file.

Why Does This Work?

For the curious, a brief explanation of the vocal-removal algorithm is in order. As you noticed from the algorithm, we are simply subtracting one channel from the other (and then dividing by 2 to keep the volume from getting too loud). So why does subtracting the left channel from the right channel magically remove vocals?

When music is recorded, it is sometimes the case that vocals are recorded by a single microphone, and that single vocal track is used for the vocals in both channels. The other instruments in the song are recorded by multiple microphones, so that they sound different in both channels. Subtracting one channel from the other takes away everything that is ``in common'' between those two channels which, if we're lucky, means removing the vocals.

Of course, things rarely work so well. Try your vocal remover on this badly-behaved wav file. Sure, the vocals are gone, but so is the body of the music! Apparently, some of the instruments were also recorded "centred", so that they are removed along with the vocals when channels are subtracted.

Stambaugh answered 19/6, 2011 at 7:25 Comment(4)

Nah, I only audited the class so I didn't have to. Looks like the link doesn't work anymore... – Stambaugh 27/10, 2014 at 3:56

WAV files are RIFF files with one or more WAVE sections. Modifying the file in this way may break files with multiple WAVE sections and will also clobber other parts such as INFO and ID3 tags. – Clavier 16/9, 2016 at 16:33

I tried this with WAV files. output wav file seems to have corrupted. When I tried opening output wav file with VLC, following error shows up wav demux error: cannot peek wav demux error: cannot find 'data' chunk wav demux error: An error occurred during wav demuxing ps demux error: cannot peek mpgv demux error: cannot peek mjpeg demux error: cannot peek ps demux error: cannot peek core input error: no suitable demux module for file/any:///home/srinivas/workspace/Extract%20Vocals/output.wav Any suggestions? – Ilse 31/3, 2017 at 21:11

link is dead!... – Abloom 10/4, 2017 at 9:53

You can use the pydub Toolbox, see here for details, also see here for related question. It's dependent on FFmpeg and can read any fileformat

Then you can do the following:

from pydub import AudioSegment
from pydub.playback import play

# read in audio file and get the two mono tracks
sound_stereo = AudioSegment.from_file(myAudioFile, format="mp3")
sound_monoL = sound_stereo.split_to_mono()[0]
sound_monoR = sound_stereo.split_to_mono()[1]

# Invert phase of the Right audio file
sound_monoR_inv = sound_monoR.invert_phase()

# Merge two L and R_inv files, this cancels out the centers
sound_CentersOut = sound_monoL.overlay(sound_monoR_inv)

# Export merged audio file
fh = sound_CentersOut.export(myAudioFile_CentersOut, format="mp3")

Kaden answered 10/3, 2017 at 22:23 Comment(1)

How do I remove the resultant centersOut from the original. – Tribromoethanol 31/5, 2017 at 15:59

Above a specified limit? sounds like a high pass filter...You could use phase cancellation if you had the acapella track along with the original. Otherwise, unless its an old 60s-era track that has vocals directly in the middle and everything else hard panned, i don't think there's a super clean way of removing vocals.

Struck answered 9/9, 2010 at 0:58 Comment(3)

is there any way you know for separating different sounds of the input sound? i mean for example the algorithm give us for example 100 different found sounds and leave finding the specific sounds to us to be removed. – Vulnerable 10/11, 2014 at 7:14

@ConductedClever: en.wikipedia.org/wiki/Independent_component_analysis – Imperceptible 6/3, 2016 at 7:56

Or, more generally, en.wikipedia.org/wiki/Blind_signal_separation – Imperceptible 6/3, 2016 at 8:21

Recommended topics

Hot tags