Scikit-learn - Cannot load MNIST Original dataset using fetch_openml in Python
Asked Answered
S

6

12

I'm trying to load the MNIST Original dataset in Python. The sklearn.datasets.fetch_openml function doesn't seem to work for this.

Here is the code I'm using-

from sklearn.datasets import fetch_openml
dataset = fetch_openml("MNIST Original") 

I get this error-

File "generateClassifier.py", line 11, in <module>
  dataset = fetch_openml("MNIST Original")
  File "/home/inglorion/.local/lib/python3.5/site- 
packages/sklearn/datasets/openml.py", line 526, in fetch_openml
data_info = _get_data_info_by_name(name, version, data_home)
  File "/home/inglorion/.local/lib/python3.5/site- 
packages/sklearn/datasets/openml.py", line 302, in 
_get_data_info_by_name
    data_home)
  File "/home/inglorion/.local/lib/python3.5/site- 
packages/sklearn/datasets/openml.py", line 169, in 
_get_json_content_from_openml_api
    raise error
  File "/home/inglorion/.local/lib/python3.5/site- 
packages/sklearn/datasets/openml.py", line 164, in 
_get_json_content_from_openml_api
    return _load_json()
  File "/home/inglorion/.local/lib/python3.5/site- 
packages/sklearn/datasets/openml.py", line 52, in wrapper
    return f()
  File "/home/inglorion/.local/lib/python3.5/site- 
packages/sklearn/datasets/openml.py", line 160, in _load_json
    with closing(_open_openml_url(url, data_home)) as response:
  File "/home/inglorion/.local/lib/python3.5/site- 
packages/sklearn/datasets/openml.py", line 109, in _open_openml_url
with closing(urlopen(req)) as fsrc:
  File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.5/urllib/request.py", line 472, in open
    response = meth(req, response)
  File "/usr/lib/python3.5/urllib/request.py", line 582, in 
http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.5/urllib/request.py", line 510, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.5/urllib/request.py", line 444, in 
_call_chain
    result = func(*args)
  File "/usr/lib/python3.5/urllib/request.py", line 590, in 
http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
        urllib.error.HTTPError: HTTP Error 400: Bad Request

How can I fix this? Alternately, is there any other way to load the MNIST dataset into Python?

I'm using version 0.20.2 of scikit-learn.

I'm relatively new to programming in general, so I would appreciate it if I could get a simple answer. Thanks!

Snug answered 25/1, 2019 at 12:6 Comment(7)
Try fetch_mldata('MNIST original')Astaire
That didn't work for me either; I think fetch_mldata is soon to be deprecated anyway.Snug
You're right. Try 'MNIST original', not MNIST OriginalAstaire
I did, and got the same error.Snug
checkout the answers on this thread #47325421Coahuila
@DrBrwts : I did, it seems to be a problem unrelated to what I'm facing, as I got a totally different error...Snug
from sklearn import datasets data = datasets.load_digits() I hope that's enough of the original MNIST data for youAstaire
A
22

Try

mnist = fetch_openml('mnist_784')

I found it via https://www.openml.org/ under https://www.openml.org/d/554

Alexandros answered 5/3, 2019 at 15:40 Comment(0)
C
5

Method fetch_openml() download dataset from mldata.org which is not stable and can not connect. An alternative way is manually to download the data set from the original data. You can download data from Kaggle(mnist data) and run the following code

from scipy.io import loadmat
mnist = loadmat("../input/mnist-original.loadmat")
mnist_data = mnist["data"].T
mnist_label = mnist["label"][0]
Control answered 23/5, 2021 at 5:33 Comment(0)
L
3

you can use:

mist = fetch_openml('mnist_784', version=1)
Lanell answered 22/3, 2020 at 12:21 Comment(0)
W
2

fetch_mldata is deprecated since scikit-learn v0.20

Test sklearn version

import sklearn
sklearn.__version__

Import Dataset

from sklearn.datasets import fetch_openml
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

Example

Wiley answered 28/2, 2020 at 10:41 Comment(0)
G
0

I was also facing similar problem. Updating sklearn's version worked for me

I just ran the following command

conda update scikit-learn

Then to verify the version, you can do something like this

import nltk
import sklearn

print('nltk version: {}.'.format(nltk.__version__))
print('scikit-learn version: {}.'.format(sklearn.__version__))

Do not forget to restart the kernel after updating the sklearn's version.

Gastongastralgia answered 5/4, 2021 at 1:5 Comment(0)
I
0
mnist = fetch_openml('mnist_784')
Iveyivie answered 6/9, 2023 at 3:2 Comment(1)
Welcome to Stack Overflow! Please read How to Answer and edit your answer to contain an explanation as to why this code would actually solve the problem at hand. Always remember that you're not only solving the problem, but are also educating the OP and any future readers of this post.Spit

© 2022 - 2024 — McMap. All rights reserved.