Which algorithm should I use for signal (sound) one class classification?
Asked Answered
E

11

12

Update I summarized the question and its answers here

  1. My objective is to detect the presence of a given pattern in a noisy signal. I want to detect the presence of a species of insect recording the sounds with a microphone. I have previously recorded the sound of the insect in a digital format.
  2. I am not trying to do voice recognition.
  3. I am already using convolution between the input signal and the pattern to determine their similarity level. But I think that this technique is more suited to discrete time (i.e. digital communications, where signals occurs at fixed intervals) and to distinguish an input signal between 2 given patterns (I have only one pattern).
  4. I am afraid to use neural networks, because I never used them, and I don't know if I could embed that code.

Could you please point me some other approaches, or try to convince me that my current approach still is a good idea or that neural networks may be a feasible way?

Update I already have 2 good answers, but another one would be welcome, and even rewarded.

Equinoctial answered 14/1, 2009 at 0:19 Comment(0)
S
11

A step up from convolution is dynamic time warping which can be thought of as a convolution operator that stretches and shrinks one signal to optimally match another.

Perhaps a simpler approach would be to do an FFT of the sample and determine if your insect any particular frequencies that can be filtered on.

On the more complex side, but not quite a neural network, are SVM toolkits like libsvm and svmlight that you can throw your data at.

Regardless of the path you attempt, I would spend time exploring the nature of the sound your insect makes using tools like FFT. After all, it will be easier teaching a computer to classify the sound if you can do it yourself.

Sacchariferous answered 14/1, 2009 at 0:52 Comment(4)
FFT being Fast Fourier Transform.Blackball
I already do a lot of FFT, but the signal has a wide spectrum, and many components, I don't know what to look for when viewing the spectrum. I already splited it in a sequency of frequency plots, to see how the magnitudes vary in time. I still haven't learned much from it.Equinoctial
Despite the age of this question... -1 for either convolution, correlation, or DTW. None of those are robust enough to solve the asker's problem, period. +1 for SVM. Try on MFCCs first.Ku
SVM works pretty well to detect things like a buzz or water boiling, but given some samples of nothing/noise, and some samples of the sound to be identified, it seems to recognize the sound but not when other sounds are going on.Byrom
R
3

Some more information is needed.

When you say noisy signal what is the background noise? Is it, to a first approximation, stationary (in a statistical sense, i.e. constant) or is it non-stationary (i.e. likely to contain other sounds, such as other animal calls etc?)

If the background noise is non-stationary then your best bet might be to use something called Independent Components Analysis which attempts to separate a given sound mixture into its component sources, you wouldn't even need the original recording of the insect itself. Lots of ICA software is linked from the Wikipedia page.

(Edit: ICA is a case of Blind Source Separation (BSS), there are many other ways of doing BSS and it might help to search for those as well.)

If however, the background noise is stationary then the problem is much easier (though still very hard):

In this case the approach I would use is as follows. Analyse the amplitude spectrum of a bit of the noise and the amplitude spectrum of your insect call. If you're lucky the insect call may, in general, be in a different frequency band to the noise. If so filter the incoming signal with suitable high-, low-, or band-pass filter.

You can then try comparing sections of your filtered signal that contain "more energy" than average with your (filtered) insect call. Possibly by using the image similarity algorithms suggested by A. Rex.

Edit: Since your background-noise is non-stationary then I can only suggest that searching for Blind Source Separation of non-Gaussian sources may lead you to some more algorithms. I'm afraid that the answer is that there is no simple algorithm that will do what you want.

Rinse answered 27/1, 2009 at 13:2 Comment(1)
The background noise non-stationary. Is the background noise you can find in a farm : wind, rain, machinery and other animals.Equinoctial
K
3

Sound like a typical one class classification problem i.e. you want to search one thing in a large pool of other things you don't care about.

What you want to do is find a set of features or descriptors that you can calculate for every short piece of your raw recording that you can then match against the features your clean recording produces. I don't think convolution is neccessarily bad, though it is rather sensitive to noise so it might not be optimal for your case. What might actually work in your case is pattern matching on a binned fourier transform. You take the fourier transform of your signal, giving you a power vs frequency graph (rather than a power vs time graph) then you divide the frequency in bands and you take the average power for each band as a feature. If your data contains mostly white noise the patern you get from a raw insect sound of similar length will very closely match the pattern of your reference sound. This last trick has been used succesfully (with some windowing) to crack audio captcha's as used by google et al to make their sites accessible to the blind.

By the way, because your raw audio signal is digital (otherwise processing with a computer will not work ;-)) convolution is appropriate. You should perform the convolution between your reference signal and a sample of equal length from the raw input starting from each sample. So, if your reference signal has length N, and your raw sample has length M where M>=N then you should perform M-N+1=P convolutions between your reference signal and P samples from your raw input starting at 1..P. The best possibility for the location of the reference sound in the raw sample is the sample with the highest convolution score. Note that this becomes insanely time consuming very quickly.

Fourier transform based matching as I explained above using 50% overlapping samples from your raw data of twice the length of your reference sample would at least be faster (though not neccessarily better)

Kulp answered 27/1, 2009 at 15:33 Comment(3)
"one class classification problem", with this you changed my way researching solutions for this problem. Seriously.Equinoctial
IMO this is the right idea. Window your signal, FFT it, bin the frequencies. Use all the bins from the current window, as well as the last couple of windows, and use them as your features for an SVM. libsvm is good, they've got an "SVM for beginners" document that'll take you 90% of the way there.Veracity
SVM's are very appropriate, though optimally you'd use a one class classifier. I'm sure there are one class classification adaptions of SVM's but I'm not sure there is one in libsvm. Jay, do you?Kulp
L
2

If I were you would start reading a little bit about Window Functions like Hamming window, this is a good starting point for sound recognition. (This is, of course, combined with Fourier Transformation)

Luciano answered 14/1, 2009 at 0:51 Comment(0)
P
2

You can try a Matched Filter. Although I've never actually used one, I've heard good things.

Also, although not simple, I think a Hidden Markov Model (HMM, I know you said no speech recognition, but hear me out!) would provide the best results for you. Again, I've never actually used one but there are open source implementations available all over the place. You would just need to train it using your exisiting "clean" insect recording. Here is one open source implementation: General Hidden Markov Model Library.

Pointing answered 27/1, 2009 at 13:51 Comment(1)
Convolution is equivalent to a Matched Filter in certain conditions, including discreet time signals. But as I am working with continuous time signals your answer is good.Equinoctial
B
2

Admittedly this is not my area of expertise but my first thought is a recursive least squares filter - it performs autocorrelation. It's similar to the convolution filter you're using now but a bit more advanced. Kalman filtering is an extension of this - it's used to regenerate a signal from multiple noisy measurements so it's probably not useful in this case. I would not reject offhand neural networks - they're very useful at this sort of thing (provided you train them properly).

Thinking about this more in depth I would probably recommend using an FFT. Chances are the signal you're looking for is very band-limited, and you'd probably have more luck using a bandpass filter on the data then an FFT and finally using your simple convolution filter on that data instead of the time-domain data points. Or do both and have twice the data. I'm not heavy into math so I cant' tell you if you'll get significant (not linearly-dependent) results using this method but the only thing you're losing is time.

Bezel answered 27/1, 2009 at 15:57 Comment(0)
C
1

You may be interested in a the MA Toolbox, a Matlab implementation of similarity measure(s).

I personally found this paper, General sound classification and similarity in MPEG-7, interesting. However, it might be behind a paywall (I don't know) and it might not be that useful in practice.

The GPL-ed framework Marsyas has a tool for machine learning classification, called kea. My guess is that this probably does not do what you want or is too much effort to hook up to.

My only idea otherwise is to take Fourier transforms, effectively transforming your sounds into grayscale images. Then use one of the many image similarity algorithms.

Callboard answered 27/1, 2009 at 12:38 Comment(0)
W
1

A Naive Bayes Classifier may be worthwhile here, classifying sound samples into ones which contain your species of interest and ones which do not. It works quite well for complex phenomena; I once used it to decide if a given millimeter-wave RADAR data set contained an obstacle such as brush, a tank trap, etc. As for how to break up your continuous data into discrete chunks for the Bayesian classifier, you might just slide along the continuous data set and break off chunks equal in length to your insect sample. For example, if the sample you're comparing against is 2 seconds long, you might feed the discriminator 0-2s, 0.5-2.5s, 1-3s, etc. You'll need to train the discriminator, but that is a common requirement of any machine learning-based solution.

These sorts of approaches are about the only way to go if your insect species doesn't have a single, relatively distinct sound that you're looking for. Cross-correlation/convolution are of limited utility if you're looking for something more complex than a single sound which may be at higher or lower volume.

There are naive Bayes classifier implementations for several languages, such as nbc.

Wargo answered 27/1, 2009 at 15:45 Comment(0)
P
1

You may want a Wiener filter approach.

Pyrography answered 29/1, 2009 at 20:55 Comment(0)
M
1

Google: FastICA algorithm. Some use ICA and Blind-Source Signal Separation Interchangeably. The author of the algorithm wrote a fantastic book on ICA that is around $40-$60 used on amazon.

Misdeed answered 28/2, 2014 at 12:49 Comment(0)
V
1

Over the last few years, the technology for this has matured. In 2009 when the question was asked, neural networks already appeared as reasonable candidates. Since then, it has become clear that the recurrent variants (LSTM/GRU) of neural networks are the most capable. In addition, these can be paired with one or two convolutional layers (CNN), with the CNN layer operating on a few frames of FFT-transformed audio. A common choice is 10 frames of 1024-point FFT's, giving you 5130 amplitudes.

Vonvona answered 8/7, 2022 at 11:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.