python divide by zero encountered in log - logistic regression
Asked Answered
R

7

10

I'm trying to implement a multiclass logistic regression classifier that distinguishes between k different classes.

This is my code.

import numpy as np
from scipy.special import expit


def cost(X,y,theta,regTerm):
    (m,n) = X.shape
    J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
    return J

def gradient(X,y,theta,regTerm):
    (m,n) = X.shape
    grad = np.dot(((expit(np.dot(X,theta))).reshape(m,1) - y).T,X)/m + (np.concatenate(([0],theta[1:].T),axis=0)).reshape(1,n)
    return np.asarray(grad)

def train(X,y,regTerm,learnRate,epsilon,k):
    (m,n) = X.shape
    theta = np.zeros((k,n))
    for i in range(0,k):
        previousCost = 0;
        currentCost = cost(X,y,theta[i,:],regTerm)
        while(np.abs(currentCost-previousCost) > epsilon):
            print(theta[i,:])
            theta[i,:] = theta[i,:] - learnRate*gradient(X,y,theta[i,:],regTerm)
            print(theta[i,:])
            previousCost = currentCost
            currentCost = cost(X,y,theta[i,:],regTerm)
    return theta

trX = np.load('trX.npy')
trY = np.load('trY.npy')
theta = train(trX,trY,2,0.1,0.1,4)

I can verify that cost and gradient are returning values that are in the right dimension (cost returns a scalar, and gradient returns a 1 by n row vector), but i get the error

RuntimeWarning: divide by zero encountered in log
  J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])

why is this happening and how can i avoid this?

Rodriques answered 30/6, 2016 at 13:57 Comment(2)
Looking at your formula, my best guess would be that m is 0 so (regTerm / (2 * m)) and the / m will become a divide by 0.Obscure
m is not 0, it is 4104 in my case.Rodriques
G
7

You can clean up the formula by appropriately using broadcasting, the operator * for dot products of vectors, and the operator @ for matrix multiplication — and breaking it up as suggested in the comments.

Here is your cost function:

def cost(X, y, theta, regTerm):
    m = X.shape[0]  # or y.shape, or even p.shape after the next line, number of training set
    p = expit(X @ theta)
    log_loss = -np.average(y*np.log(p) + (1-y)*np.log(1-p))
    J = log_loss + regTerm * np.linalg.norm(theta[1:]) / (2*m)
    return J

You can clean up your gradient function along the same lines.

By the way, are you sure you want np.linalg.norm(theta[1:]). If you're trying to do L2-regularization, the term should be np.linalg.norm(theta[1:]) ** 2.

Garden answered 30/6, 2016 at 15:7 Comment(0)
L
20

The proper solution here is to add some small epsilon to the argument of log function. What worked for me was

epsilon = 1e-5    

def cost(X, y, theta):
    m = X.shape[0]
    yp = expit(X @ theta)
    cost = - np.average(y * np.log(yp + epsilon) + (1 - y) * np.log(1 - yp + epsilon))
    return cost
Lovesick answered 12/8, 2018 at 22:3 Comment(1)
Can you explain why this helps solve OP's problem?Simarouba
G
7

You can clean up the formula by appropriately using broadcasting, the operator * for dot products of vectors, and the operator @ for matrix multiplication — and breaking it up as suggested in the comments.

Here is your cost function:

def cost(X, y, theta, regTerm):
    m = X.shape[0]  # or y.shape, or even p.shape after the next line, number of training set
    p = expit(X @ theta)
    log_loss = -np.average(y*np.log(p) + (1-y)*np.log(1-p))
    J = log_loss + regTerm * np.linalg.norm(theta[1:]) / (2*m)
    return J

You can clean up your gradient function along the same lines.

By the way, are you sure you want np.linalg.norm(theta[1:]). If you're trying to do L2-regularization, the term should be np.linalg.norm(theta[1:]) ** 2.

Garden answered 30/6, 2016 at 15:7 Comment(0)
S
3

Cause:

This is happening because in some cases, whenever y[i] is equal to 1, the value of the Sigmoid function (theta) also becomes equal to 1.

Cost function:

J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])

Now, consider the following part in the above code snippet:

np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1)))

Here, you are performing (1 - theta) when the value of theta is 1. So, that will effectively become log (1 - 1) = log (0) which is undefined.

Sulphide answered 20/12, 2019 at 6:13 Comment(0)
C
2

I'm guessing your data has negative values in it. You can't log a negative.

import numpy as np
np.log(2)
> 0.69314718055994529
np.log(-2)
> nan

There are a lot of different ways to transform your data that should help, if this is the case.

Carburetor answered 30/6, 2016 at 14:1 Comment(6)
No. I just checked. No negative values in the data anywhere.Rodriques
In that case, you're doing both subtraction and division inside that enormous line that the error comes up on, so that's where I would look next. Also it would make it much more comprehensible to split that line apart into steps I think. It would also make it easier to debug.Carburetor
For example, calculate the denominator separately and assign it to a variable, then divide on that line by the variable. But before that put some intermediate output that prints the denominator for you, so you can see if it is indeed zero. If it is you can split it apart more to track down how it's getting there.Carburetor
thats the thing. the only fraction i have is the /m bit. the denominator is not zero there. inside the log i dont divide by anything.Rodriques
Split it apart and look. Obviously something is doing what isn't expected. It's good practice to use intermediate output on this sort of thing anyway, to make sure every step is doing what you expect.Carburetor
@OriaGruber: Until you split up the code and/or print the values involved, you lack sufficient evidence to claim that the values are valid. Do the basic debugging; make it a habit. You're going to need the skills as you progress in programming.Calysta
B
1
def cost(X, y, theta):
    yp = expit(X @ theta)
    cost = - np.average(y * np.log(yp) + (1 - y) * np.log(1 - yp))
    return cost

The warning originates from np.log(yp) when yp==0 and in np.log(1 - yp) when yp==1. One option is to filter out these values, and not to pass them into np.log. The other option is to add a small constant to prevent the value from being exactly 0 (as suggested in one of the comments above)

Biotype answered 10/6, 2019 at 12:54 Comment(0)
I
0

Add epsilon value[which is a miniature value] to the log value so that it won't be a problem at all. But i am not sure if it will give accurate results or not .

Impostor answered 16/4, 2021 at 10:41 Comment(0)
S
0

You can update the code like this, the problem occurs when p==0 or p==1

epsilon = 1e-16
def costFunction(theta, X, y):
    m = len(y)
    h = expit(X.dot(theta))
    pos_1 = np.where(h == 1)
    pos_0 = np.where(h == 0)
    h[pos_1] = h[pos_1] - epsilon
    h[pos_0] = h[pos_0] + epsilon
    J = (-1/m) * (np.log(h).T.dot(y) + np.log(1-h).T.dot(1-y))
    if np.isnan(J):
        return np.inf
    return J
Stairhead answered 5/8 at 13:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.