Simple prediction using linear regression with python

Asked 14/4, 2015 at 8:59 Answered 11/4, 2021 at 11:38

Solved python scikit-learn linear-regression

data2 = pd.DataFrame(data1['kwh'])
data2
                          kwh
date    
2012-04-12 14:56:50     1.256400
2012-04-12 15:11:55     1.430750
2012-04-12 15:27:01     1.369910
2012-04-12 15:42:06     1.359350
2012-04-12 15:57:10     1.305680
2012-04-12 16:12:10     1.287750
2012-04-12 16:27:14     1.245970
2012-04-12 16:42:19     1.282280
2012-04-12 16:57:24     1.365710
2012-04-12 17:12:28     1.320130
2012-04-12 17:27:33     1.354890
2012-04-12 17:42:37     1.343680
2012-04-12 17:57:41     1.314220
2012-04-12 18:12:44     1.311970
2012-04-12 18:27:46     1.338980
2012-04-12 18:42:51     1.357370
2012-04-12 18:57:54     1.328700
2012-04-12 19:12:58     1.308200
2012-04-12 19:28:01     1.341770
2012-04-12 19:43:04     1.278350
2012-04-12 19:58:07     1.253170
2012-04-12 20:13:10     1.420670
2012-04-12 20:28:15     1.292740
2012-04-12 20:43:15     1.322840
2012-04-12 20:58:18     1.247410
2012-04-12 21:13:20     0.568352
2012-04-12 21:28:22     0.317865
2012-04-12 21:43:24     0.233603
2012-04-12 21:58:27     0.229524
2012-04-12 22:13:29     0.236929
2012-04-12 22:28:34     0.233806
2012-04-12 22:43:38     0.235618
2012-04-12 22:58:43     0.229858
2012-04-12 23:13:43     0.235132
2012-04-12 23:28:46     0.231863
2012-04-12 23:43:55     0.237794
2012-04-12 23:59:00     0.229634
2012-04-13 00:14:02     0.234484
2012-04-13 00:29:05     0.234189
2012-04-13 00:44:09     0.237213
2012-04-13 00:59:09     0.230483
2012-04-13 01:14:10     0.234982
2012-04-13 01:29:11     0.237121
2012-04-13 01:44:16     0.230910
2012-04-13 01:59:22     0.238406
2012-04-13 02:14:21     0.250530
2012-04-13 02:29:24     0.283575
2012-04-13 02:44:24     0.302299
2012-04-13 02:59:25     0.322093
2012-04-13 03:14:30     0.327600
2012-04-13 03:29:31     0.324368
2012-04-13 03:44:31     0.301869
2012-04-13 03:59:42     0.322019
2012-04-13 04:14:43     0.325328
2012-04-13 04:29:43     0.306727
2012-04-13 04:44:46     0.299012
2012-04-13 04:59:47     0.303288
2012-04-13 05:14:48     0.326205
2012-04-13 05:29:49     0.344230
2012-04-13 05:44:50     0.353484
...

65701 rows × 1 columns

I have this dataframe with this index and 1 column.I want to do simple prediction using linear regression with sklearn.I'm very confused and I don't know how to set X and y(I want the x values to be the time and y values kwh...).I'm new to Python so every help is valuable.Thank you.

Boggers answered 14/4, 2015 at 8:59 Comment(0)

The first thing you have to do is split your data into two arrays, X and y. Each element of X will be a date, and the corresponding element of y will be the associated kwh.

Once you have that, you will want to use sklearn.linear_model.LinearRegression to do the regression. The documentation is here.

As for every sklearn model, there are two steps. First you must fit your data. Then, put the dates of which you want to predict the kwh in another array, X_predict, and predict the kwh using the predict method.

from sklearn.linear_model import LinearRegression

X = []  # put your dates in here
y = []  # put your kwh in here

model = LinearRegression()
model.fit(X, y)

X_predict = []  # put the dates of which you want to predict kwh here
y_predict = model.predict(X_predict)

Stanza answered 14/4, 2015 at 9:37 Comment(1)

what does predict gives? what are the numbers in the resulting array? – Subdiaconate 14/11, 2016 at 15:48

Predict() function takes 2 dimensional array as arguments. So, If u want to predict the value for simple linear regression, then you have to issue the prediction value within 2 dimentional array like,

model.predict([[2012-04-13 05:55:30]]);

If it is a multiple linear regression then,

model.predict([[2012-04-13 05:44:50,0.327433]])

Epiphragm answered 18/3, 2018 at 21:3 Comment(0)

Liner Regression:

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
data=pd.read_csv('Salary_Data.csv')  
X=data.iloc[:,:-1].values  
y=data.iloc[:,1].values  

#split dataset in train and testing set   
from sklearn.cross_validation import train_test_split  
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=10,random_state=0)  

from sklearn.linear_model import LinearRegression  
regressor=LinearRegression()  
regressor.fit(X_train,Y_train)  
y_pre=regressor.predict(X_test)

Inconsiderate answered 10/3, 2019 at 11:11 Comment(1)

Could you further explain how to select the data, as this is also part of the question? – Unpromising 10/3, 2019 at 11:33

You can have a look at my code on Github where I am predicting temperature using the chirps of an insect cricket with Simple Linear Regression Model. I have explained the code with comments

#Import the libraries required
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Importing the excel data 
dataset = pd.read_excel('D:\MachineLearing\Machine Learning A-Z Template Folder\Part 2 - Regression\Section 4 - Simple Linear Regression\CricketChirpsVs.Temperature.xls')

x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

#Split the data into train and test dataset
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/3,random_state=42)

#Fitting Simple Linear regression data model to train data set
from sklearn.linear_model import LinearRegression
regressorObject=LinearRegression()
regressorObject.fit(x_train,y_train)

#predict the test set
y_pred_test_data=regressorObject.predict(x_test)


# Visualising the Training set results in a scatter plot
plt.scatter(x_train, y_train, color = 'red')
plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
plt.title('Cricket Chirps vs Temperature (Training set)')
plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
plt.ylabel('Temperature (in degrees Fahrenheit)')
plt.show()

# Visualising the test set results in a scatter plot
plt.scatter(x_test, y_test, color = 'red')
plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
plt.title('Cricket Chirps vs Temperature (Test set)')
plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
plt.ylabel('Temperature (in degrees Fahrenheit)')
plt.show()

For more information please visit

https://github.com/wins999/Cricket_Chirps_Vs_Temprature--Simple-Linear-Regression-in-Python-

Salamanca answered 1/5, 2019 at 10:7 Comment(0)

After splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state =0)

Training your Simple Linear Regression model on the Training set

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Predicting the Test set results

y_predict = regressor.predict(X_test)

Parish answered 8/7, 2020 at 10:28 Comment(0)

Just in case someone is looking for a solution without sklearn

import numpy as np
import pandas as pd

def variance(values, mean):
    return sum([(val-mean)**2 for val in values])

def covariance(x, mean_x, y , mean_y):
    covariance = 0.0
    for r in range(len(x)):
        covariance = covariance + (x[r] - mean_x) * (y[r] - mean_y)
    return covariance

def get_coef(df):
    mean_x = sum(df['x']) / float(len(df['x']))
    mean_y = sum(df['y']) / float(len(df['y']))
    variance_x = variance(df['x'], mean_x)
    #variance_y = variance(df['y'], mean_y)
    covariance_x_y = covariance(df['x'],mean_x,df['y'],mean_y)
    m = covariance_x_y / variance_x
    c = mean_y - m * mean_x
    return m,c

def get_y(x,m,c):
    return m*x+c

inspired from https://github.com/dhirajk100/Linear-Regression-from-Scratch-in-Python/blob/master/Linear%20Regression%20%20from%20Scratch%20Without%20Sklearn.ipynb

Ferriter answered 11/4, 2021 at 11:38 Comment(0)

You should implement following code.

import pandas as pd
from sklearn.linear_model import LinearRegression # to build linear regression model
from sklearn.cross_validation import train_test_split # to split dataset

data2 = pd.DataFrame(data1['kwh'])
data2 = data2.reset_index() # will create new index (0 to 65700) so date column wont be an index now.
X = data2.iloc[:,0]   # date column
y = data2.iloc[:,-1]  # kwh column

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.80, random_state=20)  

linearModel = LinearRegression()
linearModel.fit(Xtrain, ytrain)
ypred = model.predict(Xtest)

here ypred will give you probabilities.

Stun answered 5/5, 2019 at 16:29 Comment(0)

Recommended topics

Hot tags