I'm looking to find the distance between the points and the prediction line. Ideally I would like the results to be displayed in a new column which contains the distance, called 'Distance'.
My Imports:
import os.path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.linear_model import LinearRegression
%matplotlib inline
Sample of my data:
idx Exam Results Hours Studied
0 93 8.232795
1 94 7.879095
2 92 6.972698
3 88 6.854017
4 91 6.043066
5 87 5.510013
6 89 5.509297
My code so far:
x = df['Hours Studied'].values[:,np.newaxis]
y = df['Exam Results'].values
model = LinearRegression()
model.fit(x, y)
plt.scatter(x, y,color='r')
plt.plot(x, model.predict(x),color='k')
plt.show()
Any help would be greatly appreciated. Thanks