ValueError: Expected 2D array, got scalar array instead
Asked Answered
A

2

5

While practicing Simple Linear Regression Model I got this error:

ValueError: Expected 2D array, got scalar array instead:
array=60.
Reshape your data either using array.reshape(-1, 1) if your data has a single 
feature or array.reshape(1, -1) if it contains a single sample.

This is my code (Python 3.7):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
data = pd.read_csv("hw_25000.csv")


hgt = data.Height.values.reshape(-1,1)
wgt = data.Weight.values.reshape(-1,1)

regression = LinearRegression()
regression.fit(hgt,wgt)

print(regression.predict(60))
Anywheres answered 21/1, 2019 at 19:7 Comment(0)
D
31

Short answer:

regression.predict([[60]])

Long answer: regression.predict takes a 2d array of values you want to predict on. Each item in the array is a "point" you want your model to predict on. Suppose we want to predict on the points 60, 52, and 31. Then we'd say regression.predict([[60], [52], [31]])

The reason we need a 2d array is because we can do linear regression in a higher dimension space than just 2d. For example, we could do linear regression in a 3d space. Suppose we want to predict "z" for a given data point (x, y). Then we'd need to say regression.predict([[x, y]]).

Taking this example further, we could predict "z" for a set of "x" and "y" points. For example, we want to predict the "z" values for each of the points: (0, 2), (3, 7), (10, 8). Then we would say regression.predict([[0, 2], [3, 7], [10, 8]]) which fully demonstrates the need for regression.predict to take a 2d array of values to predict on points.

Doggone answered 21/1, 2019 at 19:36 Comment(1)
that was an amazing responseFreelance
D
2

The ValueError is fairly clear, predict expects a 2D array but you passed a scalar.

hgt = np.random.randint(50, 70, 10).reshape(-1, 1)
wgt = np.random.randint(90, 120, 10).reshape(-1, 1)
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

regression = LinearRegression()
regression.fit(hgt,wgt)

regression.predict([[60]])

You get

array([[105.10013717]])
Driver answered 21/1, 2019 at 19:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.