Extract human sound from a wav file using java

Asked 24/3, 2011 at 7:46 Answered 24/3, 2011 at 14:52

I am working on a project where I have to extract the human sound from a audio .wav file using java.

The audio .wav file may have 3 to 4 sounds like dog, cat, music and human. I will have to identify the human sound then exatract that part from the audio .wav file.

I am using FFT.java and Complex.java.

Now I have written an AudioFileReader class which reads the audio.wav file from the hard-drive and then convert this to bytes array. Then used the above mentioned FFT.java and Complex.java to apply FFT.fft(bytesArray), which gives me Complex array in return;

Now the problem is how to extract the human sound byte pattern from the returned Complex array... does anyone know how I might be able to achieve this?

Edit: We are assuming a very simple audio.wav file. For example, cat sound then silence, human sound then silence, dog sound then silence etc. No mixture of voices.

Caniff answered 24/3, 2011 at 7:46 Comment(4)

Did you have a question? – Novelia 24/3, 2011 at 8:36

I am struggling with the project to finish but still no success. I need some help to extract the human sound from the wav file using java but stuck till now. – Caniff 24/3, 2011 at 9:52

I'm curious whether you will also be able to extract sound of a human imitating a cat. And what about a parrot imitating a human imitating a cat? – Dongdonga 24/3, 2011 at 10:28

I am not considering the sound of human imitating to a cat or dog etc. As I have already explained that I will only consider a very simple file containing different sounds. – Caniff 24/3, 2011 at 19:5

I think the standard way to handle problems like this are to convert the input signals into a Cepstrum or Mel-Cepstrum representation and then use the coefficients for the feature space for input into a classifier. There are many research papers that discuss solutions to these sorts of problems based on this basic approach, for example:

http://www.ics.forth.gr/netlab/data/J17.pdf

One possible shortcut you might try would be to put the input signals through a low bit-rate vocoder such as AMBE, then decode, and compare the quality of the original signal to the encoded/decoded signal. These vocoders are designed to highly compress human speech with fair to good quality at the expense of not being able to adequately represent non-speech sounds.

Misleading answered 24/3, 2011 at 14:52 Comment(1)

I agree. I have seen such an implementation that was even able to distinguish between spoken voice and music. So I assume that's the way to go. – Romito 24/3, 2011 at 15:18

This can be achieved by AI (and little short of that). You might investigate APIs for speech recognition, but I doubt their ability to support signals with noise in the background.

E.G.

Is that a cat, or someone saying 'meow'?
Is that music, or someone singing 'do, re, mi..'?
Who said 'Polly wanna cracker', the human or the parrot?

Novelia answered 24/3, 2011 at 8:38 Comment(1)

We are assuming a very simple audio.wav file. For example, cat sound then silence, human sound then silence, dog sound then silence etc. No mixture of voices. – Caniff 24/3, 2011 at 9:16

Well that's a classic AI problem (machine learning/pattern recognition) Have a look at the Wikipedia article

But basically you'll need already classified data that you feed into your algorithm so that it can learn how to classify new data. But beware, 100% correctness is something that's illusive for almost anything in this field, although for your simple problem it could be possible (depends on your exact definition of the problem)

Cephalization answered 24/3, 2011 at 13:49 Comment(0)

Recommended topics

Hot tags