Optimal Feature-to-Instance Ratio in Back Propagation Neural Network

(This topic is often phrased in the ML literature as acceptable size or shape of the data set, given that a data set is often described as an m x n matrix in which m is the number of rows (data points) and n is the number of columns (features); obvious m >> n is preferred.)

In an event, I am not aware of a general rule for an acceptable range of features-to-observations; there are probably a couple of reasons for this:

such a ratio would depend strongly on the quality of the data (signal-to-noise ratio); and
the number of features is just one element of model complexity (e.g., interaction among the features); and model complexity is the strongest determinant of the number of data instances (data points).

So there are two sets of approaches to this problem--which, because they are opposing, both can be applied to the same model:

reduce the number of features; or
use a statistical technique to leverage the data that you do have

A couple of suggestions, one for each of the two paths above:

Eliminate "non-important" features--i.e, those features that don't contribute to the variability in your response variable. Principal Component Analysis (PCA) is fast and reliable way to do this, though there are a number of other techniques which are generally subsumed under the rubric "dimension reduction."
Use Bootstrap methods instead of cross-validation. The difference in methodology seems slight but the (often substantial) improvement in reducing prediction error is well documented for multi-layer perceptrons (neural networks) (see e.g., Efron, B. and Tibshirani, R.J., The bootstrap method: Improvements on cross-validation, J. of the American Statistical Association, 92, 548-560., 1997). If you are not familiar with Bootstrap methods for splitting training and testing data, the general technique is similar to cross-validation except that instead of taking subsets of the entire data set you take subsamples. Section 7.11 of Elements is a good introduction to Bootstrap methods.

The best single source on this general topic that i have found is Chapter 7 Model Assessment and Selection from the excellent treatise Elements of Statistical Learning by Hastie, Tibshirani, and Friedman. This book is available free to download from the book's homepage.

Recommended topics

Hot tags