Simple Linear Regression in Python
Asked Answered
W

3

23

I am trying to implement this algorithm to find the intercept and slope for single variable:

ALGORITHM OF THE LINEAR REGRESSION

Here is my Python code to update the Intercept and slope. But it is not converging. RSS is Increasing with Iteration rather than decreasing and after some iteration it's becoming infinite. I am not finding any error implementing the algorithm.How Can I solve this problem? I have attached the csv file too. Here is the code.

import pandas as pd
import numpy as np

#Defining gradient_decend
#This Function takes X value, Y value and vector of w0(intercept),w1(slope)
#INPUT FEATURES=X(sq.feet of house size)
#TARGET VALUE=Y (Price of House)
#W=np.array([w0,w1]).reshape(2,1)
#W=[w0,
#    w1]

def gradient_decend(X,Y,W):
    intercept=W[0][0]
    slope=W[1][0]

    #Here i will get a list
    #list is like this
    #gd=[sum(predicted_value-(intercept+slope*x)),
    #     sum(predicted_value-(intercept+slope*x)*x)]
    gd=[sum(y-(intercept+slope*x) for x,y in zip(X,Y)),
        sum(((y-(intercept+slope*x))*x) for x,y in zip(X,Y))]
    return np.array(gd).reshape(2,1)

#Defining Resudual sum of squares
def RSS(X,Y,W):
    return sum((y-(W[0][0]+W[1][0]*x))**2 for x,y in zip(X,Y))


#Reading Training Data
training_data=pd.read_csv("kc_house_train_data.csv")

#Defining fixed parameters
#Learning Rate
n=0.0001
iteration=1500
#Intercept
w0=0
#Slope
w1=0

#Creating 2,1 vector of w0,w1 parameters
W=np.array([w0,w1]).reshape(2,1)

#Running gradient Decend
for i in range(iteration):
     W=W+((2*n)*    (gradient_decend(training_data["sqft_living"],training_data["price"],W)))
     print RSS(training_data["sqft_living"],training_data["price"],W)

Here is the CSV file.

Wickiup answered 10/1, 2016 at 12:45 Comment(1)
;P it's from the university of washington machine leanring class, i took it too, it was very fun and enlightening. I suggest you use the forum on coursera and you can get very good answers from mentors, volunteers and fellow students. coursera.org/learn/ml-regression/discussionsPetro
W
4

I have solved my own problem!

Here is the solved way.

import numpy as np
import pandas as pd
import math
from sys import stdout

#function Takes the pandas dataframe, Input features list and the target column name
def get_numpy_data(data, features, output):

    #Adding a constant column with value 1 in the dataframe.
    data['constant'] = 1    
    #Adding the name of the constant column in the feature list.
    features = ['constant'] + features
    #Creating Feature matrix(Selecting columns and converting to matrix).
    features_matrix=data[features].as_matrix()
    #Target column is converted to the numpy array
    output_array=np.array(data[output])
    return(features_matrix, output_array)

def predict_outcome(feature_matrix, weights):
    weights=np.array(weights)
    predictions = np.dot(feature_matrix, weights)
    return predictions

def errors(output,predictions):
    errors=predictions-output
    return errors

def feature_derivative(errors, feature):
    derivative=np.dot(2,np.dot(feature,errors))
    return derivative


def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
    converged = False
    #Initital weights are converted to numpy array
    weights = np.array(initial_weights)
    while not converged:
        # compute the predictions based on feature_matrix and weights:
        predictions=predict_outcome(feature_matrix,weights)
        # compute the errors as predictions - output:
        error=errors(output,predictions)
        gradient_sum_squares = 0 # initialize the gradient
        # while not converged, update each weight individually:
        for i in range(len(weights)):
            # Recall that feature_matrix[:, i] is the feature column associated with weights[i]
            feature=feature_matrix[:, i]
            # compute the derivative for weight[i]:
            #predict=predict_outcome(feature,weights[i])
            #err=errors(output,predict)
            deriv=feature_derivative(error,feature)
            # add the squared derivative to the gradient magnitude
            gradient_sum_squares=gradient_sum_squares+(deriv**2)
            # update the weight based on step size and derivative:
            weights[i]=weights[i] - np.dot(step_size,deriv)

        gradient_magnitude = math.sqrt(gradient_sum_squares)
        stdout.write("\r%d" % int(gradient_magnitude))
        stdout.flush()
        if gradient_magnitude < tolerance:
            converged = True
    return(weights)


#Example of Implementation
#Importing Training and Testing Data
# train_data=pd.read_csv("kc_house_train_data.csv")
# test_data=pd.read_csv("kc_house_test_data.csv")

# simple_features = ['sqft_living', 'sqft_living15']
# my_output= 'price'
# (simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, my_output)
# initial_weights = np.array([-100000., 1., 1.])
# step_size = 7e-12
# tolerance = 2.5e7
# simple_weights = regression_gradient_descent(simple_feature_matrix, output,initial_weights, step_size,tolerance)
# print simple_weights
Wickiup answered 12/2, 2016 at 18:37 Comment(0)
P
12

Firstly, I find that when writing machine learning code, it's best NOT to use complex list comprehension because anything that you can iterate,

  • it's easier to read if written when normal loops and indentation and/or
  • it can be done with numpy broadcasting

And using proper variable names can help you better understand the code. Using Xs, Ys, Ws as short hand is nice only if you're good at math. Personally, I don't use them in the code, especially when writing in python. From import this: explicit is better than implicit.

My rule of thumb is to remember that if I write code I can't read 1 week later, it's bad code.


First, let's decide what is the input parameters for gradient descent, you will need:

  • feature_matrix (The X matrix, type: numpy.array, a matrix of N * D size, where N is the no. of rows/datapoints and D is the no. of columns/features)
  • output (The Y vector, type: numpy.array, a vector of size N)
  • initial_weights (type: numpy.array, a vector of size D).

Additionally, to check for convergence you will need:

  • step_size (the magnitude of change when iterating through to change the weights; type: float, usually a small number)
  • tolerance (the criteria to break the iterations, when the gradient magnitude is smaller than tolerance, assume that your weights have convereged, type: float, usually a small number but much bigger than the step size).

Now to the code.

def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
    converged = False # Set a boolean to check for convergence
    weights = np.array(initial_weights) # make sure it's a numpy array

    while not converged:
        # compute the predictions based on feature_matrix and weights.
        # iterate through the row and find the single scalar predicted
        # value for each weight * column.
        # hint: a dot product can solve this easily
        predictions = [??? for row in feature_matrix]
        # compute the errors as predictions - output
        errors = predictions - output
        gradient_sum_squares = 0 # initialize the gradient sum of squares
        # while we haven't reached the tolerance yet, update each feature's weight
        for i in range(len(weights)): # loop over each weight
            # Recall that feature_matrix[:, i] is the feature column associated with weights[i]
            # compute the derivative for weight[i]:
            # Hint: the derivative is = 2 * dot product of feature_column  and errors.
            derivative = 2 * ????
            # add the squared value of the derivative to the gradient magnitude (for assessing convergence)
            gradient_sum_squares += (derivative * derivative)
            # subtract the step size times the derivative from the current weight
            weights[i] -= (step_size * derivative)

        # compute the square-root of the gradient sum of squares to get the gradient magnitude:
        gradient_magnitude = ???
        # Then check whether the magnitude is lower than the tolerance.
        if ???:
            converged = True
    # Once it while loop breaks, return the loop.
    return(weights)

I hope the extended pseudo-code helps you better understand the gradient descent. I won't fill in the ??? so as to not spoil your homework.


Note that your RSS code is also unreadable and unmaintainable. It's easier to do just:

>>> import numpy as np
>>> prediction = np.array([1,2,3])
>>> output = np.array([1,1,5])
>>> residual = output - prediction
>>> RSS = sum(residual * residual)
>>> RSS
5

Going through numpy basics will go a long way to machine learning and matrix-vector manipulation without going nuts with iterations: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.html

Petro answered 10/1, 2016 at 13:17 Comment(1)
You can easily change the tolerance code to no. of iteration (for loop), just need to change how you control the outer loop. But my preference is to go for tolerance convergence (while loop).Petro
W
4

I have solved my own problem!

Here is the solved way.

import numpy as np
import pandas as pd
import math
from sys import stdout

#function Takes the pandas dataframe, Input features list and the target column name
def get_numpy_data(data, features, output):

    #Adding a constant column with value 1 in the dataframe.
    data['constant'] = 1    
    #Adding the name of the constant column in the feature list.
    features = ['constant'] + features
    #Creating Feature matrix(Selecting columns and converting to matrix).
    features_matrix=data[features].as_matrix()
    #Target column is converted to the numpy array
    output_array=np.array(data[output])
    return(features_matrix, output_array)

def predict_outcome(feature_matrix, weights):
    weights=np.array(weights)
    predictions = np.dot(feature_matrix, weights)
    return predictions

def errors(output,predictions):
    errors=predictions-output
    return errors

def feature_derivative(errors, feature):
    derivative=np.dot(2,np.dot(feature,errors))
    return derivative


def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
    converged = False
    #Initital weights are converted to numpy array
    weights = np.array(initial_weights)
    while not converged:
        # compute the predictions based on feature_matrix and weights:
        predictions=predict_outcome(feature_matrix,weights)
        # compute the errors as predictions - output:
        error=errors(output,predictions)
        gradient_sum_squares = 0 # initialize the gradient
        # while not converged, update each weight individually:
        for i in range(len(weights)):
            # Recall that feature_matrix[:, i] is the feature column associated with weights[i]
            feature=feature_matrix[:, i]
            # compute the derivative for weight[i]:
            #predict=predict_outcome(feature,weights[i])
            #err=errors(output,predict)
            deriv=feature_derivative(error,feature)
            # add the squared derivative to the gradient magnitude
            gradient_sum_squares=gradient_sum_squares+(deriv**2)
            # update the weight based on step size and derivative:
            weights[i]=weights[i] - np.dot(step_size,deriv)

        gradient_magnitude = math.sqrt(gradient_sum_squares)
        stdout.write("\r%d" % int(gradient_magnitude))
        stdout.flush()
        if gradient_magnitude < tolerance:
            converged = True
    return(weights)


#Example of Implementation
#Importing Training and Testing Data
# train_data=pd.read_csv("kc_house_train_data.csv")
# test_data=pd.read_csv("kc_house_test_data.csv")

# simple_features = ['sqft_living', 'sqft_living15']
# my_output= 'price'
# (simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, my_output)
# initial_weights = np.array([-100000., 1., 1.])
# step_size = 7e-12
# tolerance = 2.5e7
# simple_weights = regression_gradient_descent(simple_feature_matrix, output,initial_weights, step_size,tolerance)
# print simple_weights
Wickiup answered 12/2, 2016 at 18:37 Comment(0)
C
1

It is so simple

def mean(values):
    return sum(values)/float(len(values))

def variance(values, mean):
    return sum([(x-mean)**2 for x in values])

def covariance(x, mean_x, y, mean_y):
    covar = 0.0
    for i in range(len(x)):
        covar+=(x[i]-mean_x) * (y[i]-mean_y)
    return covar
def coefficients(dataset):
    x = []
    y = []

    for line in dataset:
        xi, yi = map(float, line.split(','))

        x.append(xi)
        y.append(yi)

    dataset.close()                             

    x_mean, y_mean = mean(x), mean(y)

    b1 = covariance(x, x_mean, y, y_mean)/variance(x, x_mean)
    b0 = y_mean-b1*x_mean

    return [b0, b1]

dataset = open('trainingdata.txt')

b0, b1 = coefficients(dataset)

n=float(raw_input())

print(b0+b1*n)

reference : www.machinelearningmastery.com/implement-simple-linear-regression-scratch-python/

Crooks answered 30/6, 2017 at 12:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.