No. of hidden layers, units in hidden layers and epochs till Neural Network starts behaving acceptable on Training data

Asked 8/10, 2012 at 5:41 Answered 8/10, 2012 at 10:40

machine-learning artificial-intelligence neural-network data-mining pybrain

I am trying to solve this Kaggle Problem using Neural Networks. I am using Pybrain Python Library.

It's a classical supervised Learning Problem. In following code: 'data' variable is numpy array(892*8). 7 fields are my features and 1 field is my output value which can be '0' or '1'.

from pybrain.datasets import ClassificationDataSet
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.tools.shortcuts import buildNetwork

dataset = ClassificationDataSet(7,1)
for i in data:
    dataset.appendLinked(i[1:],i[0])
net = buildNetwork(7,9,7,1, bias = True,hiddenclass = SigmoidLayer, outclass = TanhLayer)
trainer = BackpropTrainer(net, learningrate = 0.04, momentum = 0.96, weightdecay = 0.02, verbose = True)
trainer.trainOnDataset(dataset, 8000)
trainer.testOnData(verbose = True)

After training my Neural Network, when I am testing it on Training Data, its always giving a single output for all inputs. Like:

Testing on data:
out:     [  0.075]
correct: [  1.000]
error:  0.42767858
out:     [  0.075]
correct: [  0.000]
error:  0.00283875
out:     [  0.075]
correct: [  1.000]
error:  0.42744569
out:     [  0.077]
correct: [  1.000]
error:  0.42616996
out:     [  0.076]
correct: [  0.000]
error:  0.00291185
out:     [  0.076]
correct: [  1.000]
error:  0.42664586
out:     [  0.075]
correct: [  1.000]
error:  0.42800026
out:     [  0.076]
correct: [  1.000]
error:  0.42719380
out:     [  0.076]
correct: [  0.000]
error:  0.00286796
out:     [  0.076]
correct: [  0.000]
error:  0.00286642
out:     [  0.076]
correct: [  1.000]
error:  0.42696969
out:     [  0.076]
correct: [  0.000]
error:  0.00292401
out:     [  0.074]
correct: [  0.000]
error:  0.00274975
out:     [  0.076]
correct: [  0.000]
error:  0.00286129

I have tried altering learningRate, weightDecay, momentum, number of hidden units, number of hidden layers, class of hidden layers, class of output layers so as resolve it, but in every case it gives same output for every input if input comes from Training Data.

I think I should run it more than 8000 times because when I was building Neural Network for 'XOR', It took atleast 700 iterations before it started giving errors on nano scale. Training data size on 'XOR' was only 4 whereas in this case it is 892. So I ran 8000 iterations on 10 % of the original data(Now size of Training Data is 89), even then it was giving same output for every input in Training Data. And since I want to classify input into '0' or '1', if I'm using class of Output Layer to be Softmax, then it is always giving '1' as output.

No matter which configuration(no. of hidden units, class of output layer, learning rate, class of hidden layer, momentum), was I using in 'XOR', it more or less started converging in every case.

Is is possible that there is some configuration that will finally yield lower error rates. Atleast some configuration so that it won't give same output for all inputs in Training Data.

I ran it for 80,000 iteration(Training Data Size is 89). Output Sample:

Testing on data:
out:     [  0.340]
correct: [  0.000]
error:  0.05772102
out:     [  0.399]
correct: [  0.000]
error:  0.07954010
out:     [  0.478]
correct: [  1.000]
error:  0.13600274
out:     [  0.347]
correct: [  0.000]
error:  0.06013008
out:     [  0.500]
correct: [  0.000]
error:  0.12497886
out:     [  0.468]
correct: [  1.000]
error:  0.14177601
out:     [  0.377]
correct: [  0.000]
error:  0.07112816
out:     [  0.349]
correct: [  0.000]
error:  0.06100758
out:     [  0.380]
correct: [  1.000]
error:  0.19237095
out:     [  0.362]
correct: [  0.000]
error:  0.06557341
out:     [  0.335]
correct: [  0.000]
error:  0.05607577
out:     [  0.381]
correct: [  0.000]
error:  0.07247926
out:     [  0.355]
correct: [  1.000]
error:  0.20832669
out:     [  0.382]
correct: [  1.000]
error:  0.19116165
out:     [  0.440]
correct: [  0.000]
error:  0.09663233
out:     [  0.336]
correct: [  0.000]
error:  0.05632861

Average error: 0.112558819082

('Max error:', 0.21803000849096299, 'Median error:', 0.096632332865968451)

It's giving all outputs within range(0.33, 0.5).

Oratorical answered 8/10, 2012 at 5:41 Comment(14)

Why doesn't the reported error match |out-correct|? Also, have you tried training for much more than 8000 iterations? What's the highest number of hidden nodes you have tried? – Ulrika 8/10, 2012 at 7:4

I have tried 2 hidden layers with 11 nodes in each. It was not working upto 8000 iterations. Right now I am using 2 hidden layers with 9 and 7 nodes and running 80,000 iteration, its been running for around 3 hours, will report results once its completed, although looking at Total Error, I don't really think that it'll be any better. I am sorry, I have no idea why reported error is not matching |out-correct|. – Oratorical 8/10, 2012 at 7:14

I have updated the question with results of running it for 80,000 iteration. – Oratorical 8/10, 2012 at 7:27

Seems like a clear improvement. I get the impression though that the weights start really low, possibly at zero, and that would mean that the labels in your dataset should be -1 and 1, instead of 0 and 1. Another thing you could do is to print the average error after every x training iterations, to see how it evolves. – Ulrika 8/10, 2012 at 7:37

What is the activation function for your neurons? It seems you have binary switch 0 - 1, while you should use sigmoid or hypertanh. Please, describe your data. AFAICT, you output is a probability. It's a fuzzy value, it can be any value in between 0 and 1. So your error values will become more adequate. Also, the learning time makes me think you are doing something wrong. Please, mention what is the size of the data? – Ludovika 8/10, 2012 at 7:48

@Ulrika I am trying that. Meanwhile, Is it normal to run for such large iterations?, took me more than 3 hours. And don't I need to change any other configuration like no. on hidden nodes, or class of Layers. – Oratorical 8/10, 2012 at 7:50

@Ludovika It's Titanic Survival Dataset. I have Training Data of 892 persons. My feature vector has 7 dimensions like sex, age, class, ticket price etc., of person and output vector is binary, whether person survived or not. Sigmoid Layer is class of my Hidden Layer and Output Layer class is Tanhlayer. I am almost sure I am doing something wrong. Size of my data is 892 but I am using only 10% of it, so that it iterate faster. And I have iterated it for 80,000 times. Result is mentioned in Question. – Oratorical 8/10, 2012 at 8:0

What is your learning rate? It seems it is too low. How did you initialize neuro weigths? Should be random with approx as rate range. Did you run data normalization, and which one? 892 vectors with 7 elements should learn in minutes. (though I'm not sure how Python affects the speed, I always used compiled codes). 80000 epochs seems excessive. – Ludovika 8/10, 2012 at 8:4

@Ludovika Learning Rate is 0.04. How much should it be? Weights were initialized randomly. I haven'n run any data normalization. But feature values are within bounds. – Oratorical 8/10, 2012 at 8:13

@Ulrika When I changed labels to (-1,1) from (0,1), then Average error actually increased, when I ran it for 5000 iterations. Average error: 0.462304033276 ('Max error:', 1.3235938749920506, 'Median error:', 0.11632301859632561) – Oratorical 8/10, 2012 at 8:17

Try to start with learning rate 0.2-0.4 with decrease in time proportional to current epoch (LR * (N - i)/N). What are the bounds? I suppose they are different for different features? Then normalize them to be within the same range. Use either Sigmoid or Tanh layers in all net. If you'll use Sigmoid - leave 0 and 1. If you'll use Tanh - use [-1, +1] as outputs. – Ludovika 8/10, 2012 at 8:19

@Ludovika Ya, bounds are different for different features. pClass is either 1,2 or 3; sex is either 0 or 1; upper bound on age is 90; No. of siblings aboard and No. of Parents/Childres aboard are both bounded by 10. Place of Embarkation can be 0, 1 or 2. – Oratorical 8/10, 2012 at 8:33

Are you saying that you always get the identical output despite varying training rates and # of hidden-layer nodes? If I saw that, I would be extremely suspicious that I had a configuration mistake (e.g., that my changes weren't being picked up properly). – Signalize 8/10, 2012 at 21:29

@LarryOBrien yes, maximum difference I between outputs was 1.7 when I iterated for 80,000 times. If I'm iterating for 20,000 times, difference between outputs in 0.2 hardly. – Oratorical 9/10, 2012 at 5:48

There is yet another neural network metric, which you did not mention - number of adaptable weights. I'm starting the answer from this because it's related to the numbers of hidden layers and units in them.

For good generalization, number of weights must be much less Np/Ny, where Np is a number of patterns and Ny is a number of net outputs. What is the "much" exactly is discussible, I suggest several times difference, say 10. For approximately 1000 patterns and 1 output in your task this will imply 100 weights.

It does not make sense to use 2 hidden layers. 1 is sufficient for most of tasks where non-linearity involved. In your case, the additional hidden layer makes only the difference by impacting overall perfomance. So if 1 hidden layer is used, number of neurons in it can be approximated as number of weights divided by number of inputs, that is 100/7 = 14.

I suggest to use the same activation function in all neurons, either Hypertanh or Sigmoid everywhere. Your output values are actually already normalized for Sigmoid. Anyway, you can improve NN performance by input data normalization to fit into [0,1] in all dimentions. Of course, normalize each feature on its own.

If you can do with the Pybrain lib, start learning with greater learning rate and then decrease it smoothly proportional to current step (LR * (N - i)/N), where i is current step, N - is a limit, LR - initial learning rate.

As @Junuxx suggested, output current error every M steps (if this possible) just to make sure your program works as expected. Stop learning if the difference in errors in successive steps becomes less than a threshold. Just for beginning and rough estimation of the proper NN parameters choosing set the threshold to 0.1-0.01 (there is no need in "nano scale").

The fact of running a network on 89 patterns in 80000 steps and getting the results your have is strange. Please, double check you pass correct data to the NN, and please examine what does the error values you provided mean. Possibly, either the errors, or outputs displayed are taken from wrong place. I think 10000 steps must be far enough to get acceptable results for 89 patters.

As for the specific task, I think SOM net could be another option (possily better suited than BP).

As a sidenote, I'm not familiar with Pybrain, but have coded some NNs in C++ and other languages, so your timing looks highly outsized.

Ludovika answered 8/10, 2012 at 10:40 Comment(7)

I started with a learning rate of 0.2 and after every iteration it got multiplied by 0.999 for 20,000 iterations, still outputs were same. I have checked data again, its correct. – Oratorical 9/10, 2012 at 5:53

@JackSmith, Well, then it can be something Pybrain specific, which is out of my scope. Still you should try to understand what are the errors. Try pure BP without softmax. Did you try input data normalization? – Ludovika 9/10, 2012 at 9:36

I was trying without Softmax earlier, With Softmax, every output is '1'. And normalization didn't work. – Oratorical 10/10, 2012 at 4:20

Please, provide an excerpt of input data (actual vectors that you pass into the net) in your question. Try another NN tool if possible, for example, neuroph.sourceforge.net. I hope someone could tell what is wrong with Pybrain setup for the given task, and how to interpet your error values. – Ludovika 10/10, 2012 at 10:18

Data Slice: array([['1', '2', '0', '55', '0', '0', '16', '1'], ['0', '3', '1', '2', '4', '1', '29.125', '2'], ['1', '2', '1', '28.0', '0', '0', '13', '1'], ['0', '3', '0', '31', '1', '0', '18', '1'], ['1', '3', '0', '28.0', '0', '0', '7.225', '0'], ['0', '2', '1', '35', '0', '0', '26', '1'], Output value is at index 0 and rest is my feature vector. – Oratorical 11/10, 2012 at 5:27

@JackSmith, pardon, I don't see normalization. All values should be in the range [0, 1]. – Ludovika 11/10, 2012 at 9:26

Normalization wasn't helping. I tried to make it work by first running Principal component analysis, which changed dimension of my feature vector from 7 to 5 and then running same neural network on it. Average error has been decreased to 0.07. And output isn't same for every input. I'm playing with various parameters to further lower error. – Oratorical 12/10, 2012 at 5:1

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags