Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample
Asked Answered
P

2

21

While I am predicting the one sample from my data, it gives reshape error but my model has equal number of rows. Here is my code:

import pandas as pd
from sklearn.linear_model import LinearRegression
import numpy as np
x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])

lr = LinearRegression()
lr.fit(x,y)

print(lr.predict(2.4))

The error is

if it contains a single sample.".format(array))
ValueError: Expected 2D array, got scalar array instead:
array=2.4.
Reshape your data either using array.reshape(-1, 1) if your data has a 
single feature or array.reshape(1, -1) if it contains a single sample.
Percheron answered 1/11, 2019 at 17:56 Comment(1)
if you want to know why fitting a model requires 2d model -> herePotoroo
D
32

You should reshape your X to be a 2D array not 1D array. Fitting a model requires requires a 2D array. i.e (n_samples, n_features)

x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])

lr = LinearRegression()
lr.fit(x.reshape(-1, 1), y)

print(lr.predict([[2.4]]))
Devries answered 1/11, 2019 at 18:5 Comment(2)
Thanks man. But what's that reshape does and why not using it causes error.Percheron
@razzOn2bull When fitting a model your X needs to be 2D array. i.e (n_samples, n_features). When you use .reshape(1, -1) it adds one dimension to the data. You can see this question ('#18691584) for more info regarding it.Devries
P
0

The error is basically saying to convert the flat feature array into a column array. reshape(-1, 1) does the job; also [:, None] can be used.

The second dimension of the feature array X must match the second dimension of whatever is passed to predict() as well. Since X is coerced into a 2D array, the array passed to predict() should be 2D as well.

x = np.array([2.0 , 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7])
y = np.array([196, 221, 136, 255, 244, 230, 232, 255, 267])
X = x[:, None]         # X.ndim should be 2

lr = LinearRegression()
lr.fit(X, y)

prediction = lr.predict([[2.4]])

If your input is a pandas column, then use double brackets ([[]]) get a 2D feature array.

df = pd.DataFrame({'feature': x, 'target': y})
lr = LinearRegression()
lr.fit(df['feature'], df['target'])            # <---- error
lr.fit(df[['feature']], df['target'])          # <---- OK
#        ^^         ^^                           <---- double brackets 
Why should X be 2D?

If we look at the source code of fit() (of any model in scikit-learn), one of the first things done is to validate the input via the validate_data() method, which calls check_array() to validate X. check_array() checks among other things, whether X is 2D. It is essential for X to be 2D because ultimately, LinearRegression().fit() calls scipy.linalg.lstsq to solve the least squares problem and lstsq requires X to be 2D to perform matrix multiplication.

For classifiers, the second dimension is needed to get the number of features, which is essential to get the model coefficients in the correct shape.

Parricide answered 17/3, 2023 at 18:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.