I've built an MLP with Google's TensorFlow library. The network is working but somehow it refuses to learn properly. It always converges to an output of nearly 1.0 no matter what the input actually is.
The complete code can be seen here.
Any ideas?
The input and output (batch size 4) is as follows:
input_data = [[0., 0.], [0., 1.], [1., 0.], [1., 1.]] # XOR input
output_data = [[0.], [1.], [1.], [0.]] # XOR output
n_input = tf.placeholder(tf.float32, shape=[None, 2], name="n_input")
n_output = tf.placeholder(tf.float32, shape=[None, 1], name="n_output")
Hidden layer configuration:
# hidden layer's bias neuron
b_hidden = tf.Variable(0.1, name="hidden_bias")
# hidden layer's weight matrix initialized with a uniform distribution
W_hidden = tf.Variable(tf.random_uniform([2, hidden_nodes], -1.0, 1.0), name="hidden_weights")
# calc hidden layer's activation
hidden = tf.sigmoid(tf.matmul(n_input, W_hidden) + b_hidden)
Output layer configuration:
W_output = tf.Variable(tf.random_uniform([hidden_nodes, 1], -1.0, 1.0), name="output_weights") # output layer's weight matrix
output = tf.sigmoid(tf.matmul(hidden, W_output)) # calc output layer's activation
My learning methods look like this:
loss = tf.reduce_mean(cross_entropy) # mean the cross_entropy
optimizer = tf.train.GradientDescentOptimizer(0.01) # take a gradient descent for optimizing
train = optimizer.minimize(loss) # let the optimizer train
I tried both setups for cross entropy:
cross_entropy = -tf.reduce_sum(n_output * tf.log(output))
and
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(n_output, output)
where n_output
is the original output as described in output_data
and output
the predicted/calculated value by my network.
The training inside the for-loop (for n epochs) goes like this:
cvalues = sess.run([train, loss, W_hidden, b_hidden, W_output],
feed_dict={n_input: input_data, n_output: output_data})
I am saving the outcome to cvalues for debug printig of loss
, W_hidden
, ...
No matter what I've tried, when I test my network, trying to validate the output, it always produces something like this:
(...)
step: 2000
loss: 0.0137040186673
b_hidden: 1.3272010088
W_hidden: [[ 0.23195425 0.53248233 -0.21644847 -0.54775208 0.52298909]
[ 0.73933059 0.51440752 -0.08397482 -0.62724304 -0.53347367]]
W_output: [[ 1.65939867]
[ 0.78912479]
[ 1.4831928 ]
[ 1.28612828]
[ 1.12486529]]
(--- finished with 2000 epochs ---)
(Test input for validation:)
input: [0.0, 0.0] | output: [[ 0.99339396]]
input: [0.0, 1.0] | output: [[ 0.99289012]]
input: [1.0, 0.0] | output: [[ 0.99346077]]
input: [1.0, 1.0] | output: [[ 0.99261558]]
So it is not learning properly but always converging to nearly 1.0 no matter which input is fed.
b_hidden
variable is a scalar - is that intentional? I think you should create it asb_hidden = tf.Variable(tf.constant(0.1, shape=[hidden_nodes]), name="hidden_bias")
, which might help. Another thing to try would be adding ab_output
bias term to your output layer. – Yoshib_hidden
should be a vector also and not a scalar...however, the network still converges to nearly 1.0 for every input, with or without a hidden bias, as a scalar or a vector and with or without a bias for the output layer. I really think I am missing some error in the learning method or network architecture :/ – Skull