Basically just the title. It just struck me as odd when I was dipped my toes into the sklearn
library. Is there an explanation for this?
Why does the fit method in sklearn's LinearRegression only accept 2D array for the x-values, but 1D arrays for the y-values?
I guess that is for generalization purpose. In regression X can be multidimensional, and if it has only one dimension it can be still represented as a 2D array. Y consist in only one signal and therefore can be only a 1D array. Allowing a unique format avoids writing 'if' loops handling other formats and lowers the risk of having unexpected errors deeper in the code. –
Drillstock
This might be a newbie question, but why can't Y be a multidimensional array? –
Caulfield
This would mean that you are doing multi target regression, i.e. predicting several signals at the same time from the same training set. In this case Y would be a multidimensional array, actually sklearn has an implementation for it and you can see that Y is a mutidimensional array. However this is quite unusual, as most of regression problems wouldn't benefit from it. The implementation in sklearn is a just wrapper training multiple single-target regressor. –
Drillstock
As a matter of fact though LinearRegression does accept a 2D target array, in which case it will perform a multiple linear regression –
Mandola
Yes, the main reason I asked this question was because I was confused as to why the x-values needed to be a 2D array, but like @A Co said, it's just for the sake of consistency. –
Caulfield
This is just the way it is by design choice for the fit
methods of ML models in scikit-learn afaik. It's mostly to stay consistent with the specification of the input shape: (n_samples, n_features)
:
X : {array-like, sparse matrix} of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.
Which is also made clear right at the top of check_array
, the validation step where the error is raised:
Input validation on an array, list, sparse matrix or similar. By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.
LinearRegression
does actually accept 2D
target arrays though, in which case it will perform a multiple linear regression.
© 2022 - 2024 — McMap. All rights reserved.