If you use surf features, thats mean a float set off vector [128] or [64] depending of you surf configuration you will be set the neural net as follow
-Create a data base with models :
-bikes
-cars
-autobus
-truck
-Take diferents photos of each type of objects like 10 photos of diferents model off cars, 10 photos of diferents model off bikes 10 photos of differents model off truck... etc, to each photo off each object class extract its surf feature vectors.
-Each type of object will be represent one off class of object in the neural-net like this;
-car ;object class 1 =binary representaation in 4 bits= 0 0 0 1
-bikes ;obejct class 2 =binary representaation in 4 bits= 0 0 1 0
-truck ;obejct class 3 =binary representaation in 4 bits= 0 0 1 1
-ball ;obejct class 4 =binary representaation in 4 bits= 0 1 0 0
-Each bit in binary repesentacion will be correspond to one neuron in the output layer of the network and represent one class of object to be recognized
Now the configuration of neural network will be based on the size of the feature vector and the number of types of object that you wanna recognize in this way;
The Number of nuerons in the input-layer;64 or 128 depending of the size off surf feature vector that you configured and used
The number of nuerons in the output-layer in the neural-net will be the number of classes of objects that you wanna recognize in this example 4
The activation function neecesary to each neuron is the sigmoid or tanh function (http://www.learnartificialneuralnetworks.com/), beacause the surf features are represented by floats numbers, if you use freak fetaures or another binary local feature descriptor (Brisk, ORB, BRief ) then you will be use an binary activation function to each neuron like step function o sigm function
The algoritm used to train the network is the backpropagation
before continue you need set and prepare the data set to train the neural network
example
-all feature vector extracted from picture belong a car will be label or asociated to class 1
-all feature vector extracted from picture belong a bike will be label or asociated to class 2
-all feature vector extracted from picture belong a truk will be label or asociated to class 3
-all feature vector extracted from picture belong a ball will be label or asociated to class 4
to this example you will have 4 neurons in out-put layer and 128 0r 64 neurons of in input-layer.
-The output of neural net in recognittion mode will be the neuron that have the most hight value of this 4 nuerons.
its necesarry use normalization in the interval [0,1] to all features in the data set, before begin the training phase,because the out-put of the neural net is the probability that have the input vector to belong at one class of object in the data set.
the data set to train the network have to be split as follow:
-70% off the data used to train
-15% off the data used to validate the network arquitecture (number of neurons in the hidden layyer)
-15% off the data used to test the final network
when training the neural network, the stop criterion is recognittion rate,when its is near to 85-90%
why use neural net and not svm machines, svm machines work fine ,but it not can be make a the best separation class map in no linear classification problems like this or when you have lot of diferents objects classes or types of objects, this lack is aprecciate in the recognittion phase results
I recomended you read some about the neural network theory to understand how they work
http://link.springer.com/chapter/10.1007%2F11578079_10
opencv have machine learning class to neural nets mlp module
hope this can help you