ModuleNotFoundError: No module named 'sklearn.preprocessing._data'
Asked Answered
S

7

14

My question is similar to this.

I also use pickle to save & load model. I meet the below error during pickle.load( )

from sklearn.preprocessing import StandardScaler
# SAVE
scaler = StandardScaler().fit(X_train)
X_trainScale = scaler.transform(X_train)
pickle.dump(scaler, open('scaler.scl','wb'))

# =================
# LOAD
sclr = pickle.load(open('scaler.scl','rb'))  # => ModuleNotFoundError: No module named 'sklearn.preprocessing._data'
X_testScale = sclr.transform(X_test)

ModuleNotFoundError: No module named 'sklearn.preprocessing._data'

It looks like a sklearn version issue. My sklearn version is 0.20.3, Python version is 3.7.3.

But I am using Python in an Anaconda .zip file. Is it possible to solve this without updating the version of sklearn?

Scuff answered 19/2, 2020 at 16:36 Comment(1)
Show the error message in full.Kaliningrad
E
5

I had exactly the same error message with StandardScaler using Anaconda.

Fixed it by running:

conda update --all

I think the issue was caused by running the pickle dump for creating the scaler file on a machine with a newer version of scikit-learn, and then trying to run pickle load on machine with an older version of scikit-learn. (It gave the error when running pickle load on the machine with the older version of scikit-learn but no error when running pickle load on the machine with the newer version of scikit-learn. Both windows machines). Perhaps this is due to more recent versions using a different naming convention for functions regarding underscores (as mentioned above)?

Anaconda would not let me update the scikit-learn library on it's own, because it claimed it required the older version (for some reason I could not understand). Perhaps another library was using it? So I had to fix it by updating all the libraries at the same time, which worked.

Embellishment answered 29/3, 2020 at 15:31 Comment(3)
That only helps if you installed with conda, if you used pip see @Yuchao Jiang belowOperetta
Or the other way round: picked in an older version and unpickled in a newer one - this was my case. [Rant warning] this breaking change in sklearn API should have been repaired as a bug or stashed for a major version change...Achievement
Note that updating all the packages can be dangerous sometimes, especially when one collaborates with other people. Simple pip install scikit-learn==<version> should do the trick already.Corporation
A
4

upgrade to the compatible version of sklearn by: pip install -U scikit-learn

Airglow answered 10/11, 2020 at 0:47 Comment(0)
L
4

This very specific problem occurs when there is sklearn version mismatch. For example, trying to deserialize a sklearn(>= 0.22.X) object dumped with another sklearn version < 0.22.X. Sklearn introduced a change between those version, check their release website for mo information

Likewise answered 25/1, 2021 at 14:16 Comment(1)
Sadly this affects ML model objects such as XGBoost that some of us put into production - these algos libs became so dependent on sklearn that they are now effectively one - a singe multi-module package. And you have to pin historical version of not just the modeling library, but also of scikit-learn.Achievement
G
2

I was facing a similar issue after updating scikit-learn. In my case, the culprit was QuantileTransformer. Changing

from sklearn.preprocessing.data import QuantileTransformer

to

from sklearn.preprocessing import QuantileTransformer

worked for me.

Goalie answered 1/1, 2021 at 14:51 Comment(0)
G
2

Install an older version of sklearn pip install "scikit-learn==0.19.0"

Ginzburg answered 2/3, 2021 at 7:45 Comment(0)
S
0

from sklearn import preprocessing._data as StandardScaler

Sociolinguistics answered 15/5, 2022 at 20:18 Comment(0)
O
0

This is a hack. But we are in a hacky-space.

Apparently "sklearn.preprocessing.label is used at or less than 0.21.X, and in contrast, sklearn.preprocessing._label is used as or higher than 0.22.X."

So if you are using sklearn 0.22.X or later (through at least sklearn 1.3.0), it will be difficult to load a pickle created before then.

So how might you be able to do it?

import sklearn.preprocessing
sys.modules['sklearn.preprocessing.label'] = sys.modules['sklearn.preprocessing._label']  # Terrible hack. Tell Python that the ".label" submodule is already imported (and is equal to the ._label submodule) so when the pickle goes to load it, it doesn't fail on try to load that submodule.

Then, hopefully, your pickle load will succeed. You will still get this big warning:

/home/.../.local/lib/python3.10/site-packages/sklearn/base.py:347: InconsistentVersionWarning: Trying to unpickle estimator LabelEncoder from version 0.20.0 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(

But, at least in my case, it worked. I was even able to succesfully train a classifier with the loaded data.

What next?

Clearly, this is not a long-term solution. But hopefully you can get the data you need and perhaps rework it into a cleaner modern scikit learn format somehow.

Oruntha answered 1/3 at 15:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.