Why does the fit method in sklearn's LinearRegression only accept 2D array for the x-values, but 1D arrays for the y-values?
Asked Answered
C

1

1

Basically just the title. It just struck me as odd when I was dipped my toes into the sklearn library. Is there an explanation for this?

Caulfield answered 22/4, 2020 at 14:37 Comment(5)
I guess that is for generalization purpose. In regression X can be multidimensional, and if it has only one dimension it can be still represented as a 2D array. Y consist in only one signal and therefore can be only a 1D array. Allowing a unique format avoids writing 'if' loops handling other formats and lowers the risk of having unexpected errors deeper in the code.Drillstock
This might be a newbie question, but why can't Y be a multidimensional array?Caulfield
This would mean that you are doing multi target regression, i.e. predicting several signals at the same time from the same training set. In this case Y would be a multidimensional array, actually sklearn has an implementation for it and you can see that Y is a mutidimensional array. However this is quite unusual, as most of regression problems wouldn't benefit from it. The implementation in sklearn is a just wrapper training multiple single-target regressor.Drillstock
As a matter of fact though LinearRegression does accept a 2D target array, in which case it will perform a multiple linear regressionMandola
Yes, the main reason I asked this question was because I was confused as to why the x-values needed to be a 2D array, but like @A Co said, it's just for the sake of consistency.Caulfield
M
2

This is just the way it is by design choice for the fit methods of ML models in scikit-learn afaik. It's mostly to stay consistent with the specification of the input shape: (n_samples, n_features):

X : {array-like, sparse matrix} of shape (n_samples, n_features)
     Training vector, where n_samples is the number of samples and
     n_features is the number of features.

Which is also made clear right at the top of check_array, the validation step where the error is raised:

Input validation on an array, list, sparse matrix or similar.
By default, the input is checked to be a non-empty 2D array containing
only finite values. If the dtype of the array is object, attempt
converting to float, raising on failure.

LinearRegression does actually accept 2D target arrays though, in which case it will perform a multiple linear regression.

Mandola answered 22/4, 2020 at 14:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.