I am trying to predict the admit variable with predictors such as gre,gpa and ranks. But the prediction accuracy is very low (0.66).The dataset is given below.
https://gist.github.com/abyalias/3de80ab7fb93dcecc565cee21bd9501a
The first few rows of the dataset looks like:
admit gre gpa rank_2 rank_3 rank_4
0 0 380 3.61 0.0 1.0 0.0
1 1 660 3.67 0.0 1.0 0.0
2 1 800 4.00 0.0 0.0 0.0
3 1 640 3.19 0.0 0.0 1.0
4 0 520 2.93 0.0 0.0 1.0
5 1 760 3.00 1.0 0.0 0.0
6 1 560 2.98 0.0 0.0 0.0
My code:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
y = data['admit']
x = data[data.columns[1:]]
xtrain, xtest, ytrain, ytest = train_test_split(x, y, random_state=2)
#modelling
clf = LogisticRegression(penalty='l2')
clf.fit(xtrain, ytrain)
ypred_train = clf.predict(xtrain)
ypred_test = clf.predict(xtest)
#checking the classification accuracy
accuracy_score(ytrain, ypred_train)
# 0.70333333333333337
accuracy_score(ytest, ypred_test)
# 0.66000000000000003
#confusion metrix...
confusion_matrix(ytest, ypred)
# array([[62, 1],
# [33, 4]])
The ones are wrongly predicted. How do I increase the model accuracy?