Neural network XOR gate not learning

Asked 25/7, 2016 at 6:39 Answered 4/8, 2016 at 12:54

python numpy machine-learning neural-network artificial-intelligence

I'm trying to make a XOR gate by using 2 perceptron network but for some reason the network is not learning, when I plot the change of error in a graph the error comes to a static level and oscillates in that region.

I did not add any bias to the network at the moment.

import numpy as np

def S(x):
    return 1/(1+np.exp(-x))

win = np.random.randn(2,2)
wout = np.random.randn(2,1)
eta = 0.15

# win = [[1,1], [2,2]]
# wout = [[1],[2]]

obj = [[0,0],[1,0],[0,1],[1,1]]
target = [0,1,1,0]

epoch = int(10000)
emajor = ""

for r in range(0,epoch):
    for xy in range(len(target)):
        tar = target[xy]
        fdata = obj[xy]

        fdata = S(np.dot(1,fdata))

        hnw = np.dot(fdata,win)

        hnw = S(np.dot(fdata,win))

        out = np.dot(hnw,wout)

        out = S(out)

        diff = tar-out

        E = 0.5 * np.power(diff,2)
        emajor += str(E[0]) + ",\n"

        delta_out = (out-tar)*(out*(1-out))
        nindelta_out = delta_out * eta

        wout_change = np.dot(nindelta_out[0], hnw)

        for x in range(len(wout_change)):
            change = wout_change[x]
            wout[x] -= change

        delta_in = np.dot(hnw,(1-hnw)) * np.dot(delta_out[0], wout)
        nindelta_in = eta * delta_in

        for x in range(len(nindelta_in)):
            midway = np.dot(nindelta_in[x][0], fdata)
            for y in range(len(win)):
                win[y][x] -= midway[y]



f = open('xor.csv','w')
f.write(emajor) # python will convert \n to os.linesep
f.close() # you can omit in most cases as the destructor will call it

This is the error changing by the number of learning rounds. Is this correct? The red color line is the line I was expecting how the error should change.

Anything wrong I'm doing in the code? As I can't seem to figure out what's causing the error. Help much appreciated.

Thanks in advance

Supersession answered 25/7, 2016 at 6:39 Comment(1)

You might be interested in my blog article: XOR tutorial with TensorFlow – Mention 26/7, 2016 at 6:19

Here is a one hidden layer network with backpropagation which can be customized to run experiments with relu, sigmoid and other activations. After several experiments it was concluded that with relu the network performed better and reached convergence sooner, while with sigmoid the loss value fluctuated. This happens because, "the gradient of sigmoids becomes increasingly small as the absolute value of x increases".

import numpy as np
import matplotlib.pyplot as plt
from operator import xor

class neuralNetwork():
    def __init__(self):
        # Define hyperparameters
        self.noOfInputLayers = 2
        self.noOfOutputLayers = 1
        self.noOfHiddenLayerNeurons = 2

        # Define weights
        self.W1 = np.random.rand(self.noOfInputLayers,self.noOfHiddenLayerNeurons)
        self.W2 = np.random.rand(self.noOfHiddenLayerNeurons,self.noOfOutputLayers)

    def relu(self,z):
        return np.maximum(0,z)

    def sigmoid(self,z):
        return 1/(1+np.exp(-z))

    def forward (self,X):
        self.z2 = np.dot(X,self.W1)
        self.a2 = self.relu(self.z2)
        self.z3 = np.dot(self.a2,self.W2)
        yHat = self.relu(self.z3)
        return yHat

    def costFunction(self, X, y):
        #Compute cost for given X,y, use weights already stored in class.
        self.yHat = self.forward(X)
        J = 0.5*sum((y-self.yHat)**2)
        return J

    def costFunctionPrime(self,X,y):
        # Compute derivative with respect to W1 and W2
        delta3 = np.multiply(-(y-self.yHat),self.sigmoid(self.z3))
        djw2 = np.dot(self.a2.T, delta3)
        delta2 = np.dot(delta3,self.W2.T)*self.sigmoid(self.z2)
        djw1 = np.dot(X.T,delta2)

        return djw1,djw2


if __name__ == "__main__":

    EPOCHS = 6000
    SCALAR = 0.01

    nn= neuralNetwork()    
    COST_LIST = []

    inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]

    for epoch in xrange(1,EPOCHS):
        cost = 0
        for i in inputs:
            X = i #inputs
            y = xor(X[0][0],X[0][1])
            cost += nn.costFunction(X,y)[0]
            djw1,djw2 = nn.costFunctionPrime(X,y)
            nn.W1 = nn.W1 - SCALAR*djw1
            nn.W2 = nn.W2 - SCALAR*djw2
        COST_LIST.append(cost)

    plt.plot(np.arange(1,EPOCHS),COST_LIST)
    plt.ylim(0,1)
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.title(str('Epochs: '+str(EPOCHS)+', Scalar: '+str(SCALAR)))
    plt.show()

    inputs = [ np.array([[0,0]]), np.array([[0,1]]), np.array([[1,0]]), np.array([[1,1]])]
    print "X\ty\ty_hat"
    for inp in inputs:
        print (inp[0][0],inp[0][1]),"\t",xor(inp[0][0],inp[0][1]),"\t",round(nn.forward(inp)[0][0],4)

End Result:

X       y       y_hat
(0, 0)  0       0.0
(0, 1)  1       0.9997
(1, 0)  1       0.9997
(1, 1)  0       0.0005

The weights obtained after training were:

nn.w1

[ [-0.81781753  0.71323677]
  [ 0.48803631 -0.71286155] ]

nn.w2

[ [ 2.04849235]
  [ 1.40170791] ]

I found the following youtube series extremely helpful for understanding neural nets: Neural networks demystified

There is only little which I know and also that can be explained in this answer. If you want an even better understanding of neural nets, then I would suggest you to go through the following link: cs231n: Modelling one neuron

Vicious answered 4/8, 2016 at 12:54 Comment(0)

The error calculated in each epoch should be a sum total of all sum squared errors (i.e. error for every target)

import numpy as np
def S(x):
    return 1/(1+np.exp(-x))
win = np.random.randn(2,2)
wout = np.random.randn(2,1)
eta = 0.15
# win = [[1,1], [2,2]]
# wout = [[1],[2]]
obj = [[0,0],[1,0],[0,1],[1,1]]
target = [0,1,1,0]    
epoch = int(10000)
emajor = ""

for r in range(0,epoch):

    # ***** initialize final error *****
    finalError = 0

    for xy in range(len(target)):
        tar = target[xy]
        fdata = obj[xy]

        fdata = S(np.dot(1,fdata))

        hnw = np.dot(fdata,win)

        hnw = S(np.dot(fdata,win))

        out = np.dot(hnw,wout)

        out = S(out)

        diff = tar-out

        E = 0.5 * np.power(diff,2)

        # ***** sum all errors *****
        finalError += E

        delta_out = (out-tar)*(out*(1-out))
        nindelta_out = delta_out * eta

        wout_change = np.dot(nindelta_out[0], hnw)

        for x in range(len(wout_change)):
            change = wout_change[x]
            wout[x] -= change

        delta_in = np.dot(hnw,(1-hnw)) * np.dot(delta_out[0], wout)
        nindelta_in = eta * delta_in

        for x in range(len(nindelta_in)):
            midway = np.dot(nindelta_in[x][0], fdata)
            for y in range(len(win)):
                win[y][x] -= midway[y]

     # ***** Save final error *****
     emajor += str(finalError[0]) + ",\n"


f = open('xor.csv','w')
f.write(emajor) # python will convert \n to os.linesep
f.close() # you can omit in most cases as the destructor will call it

Vicious answered 25/7, 2016 at 8:35 Comment(3)

Hey thanks for the answer, but when I plot the error graph it's different from graph to graph, why is that? is that possible? – Supersession 25/7, 2016 at 10:50

Yes, that's because randomized initial weights are used and every time the program is started initial weights change. For more info here is a good link for better understanding of backprop- mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example – Vicious 25/7, 2016 at 11:17

Thanks, yes I have read that post many many times to get an idea of this thing, don't you think that delta_in = np.dot(hnw,(1-hnw)) * np.dot(delta_out[0], wout) line is not correct? I have manually calculated and the outfrom this line is not the desired one maybe I'm using numpy.dot in a wrong way here don't you think? – Supersession 25/7, 2016 at 11:20

Recommended topics

Hot tags