MNIST data download from sklearn datasets gives Timeout error
Asked Answered
P

4

6

I am new to ML and trying to download MNIST data. The code I am using is:

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')

But, it gives an error saying:

TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

Can anyone please help me what needs to be done to rectify this issue?

Preponderance answered 1/11, 2018 at 7:37 Comment(0)
M
6

here is the issue and some workaround good people suggested :

https://github.com/scikit-learn/scikit-learn/issues/8588

easiest one was to download .mat file of MNIST with this download link:

download MNIST.mat

after download put the file inside ~/scikit_learn_data/mldata folder, if this folder doesn't exist create it and put the Mnist.mat inside it. when you have them locally scikit learn won't download it and uses that file.

Mcgaha answered 1/11, 2018 at 8:7 Comment(2)
Thanks @Mcgaha for the solution at lightning fast speed. It saved a lot of my time.Preponderance
you're welcom ^^Mcgaha
M
6

Since fetch_mldata had been deprecated, we will have to move to fetch_openml. Make sure to update your scikit-learn to version 0.20.0 or up in order to get the openml work.

  1. openml currently has 5 different datasets related to MNIST dataset. Here is one example from sklearn's document using the mnist-784 dataset.
from sklearn.datasets import fetch_openml
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
  1. Or if you don't need a very large dataset, you can use load_digits:
from sklearn.datasets  import load_digits
mnist = load_digits()

Note that if you are following the book Hands-On Machine Learning with Scikit-Learn and TensorFlow, with mnist-784 dataset, you may notice that the code

some_digit = X[36000]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap=matplotlib.cm.binary, interpolation="nearest")
plt.axis('off')
plt.show()

returns a picture of 9 instead of 5. I guess, it could either be that the mnist-784 and the mnist original are two subsets of the nist data, or the order of data is different between the two datasets.

PS: I had encountered some error about ssl when I was trying to load data, in my case I update openssl and the problem had been resolved.

Midis answered 3/2, 2019 at 9:47 Comment(0)
L
0

Though I am not sure about the reason you're getting the error, you can try below possible ways to rectify the same.

  1. Sometimes, data can be corrupted in the time of the first download. And in that case, you need to clear the cache which you can remove from the scikit data home dir. To get this directory, you can use -

    from sklearn.datasets.base import get_data_home 
    print (get_data_home())
    

Now clean the directory, and redownload.

  1. And if the problem persists still, you can refer following links to do some trial-error to check your issue.

https://github.com/ageron/handson-ml/issues/143

https://github.com/scikit-learn/scikit-learn/issues/8588

https://github.com/ageron/handson-ml/issues/8

And if you face the problem still, I would like to request you to provide the detail traceback to help me identify the problem.

Thanks!!

Lituus answered 1/11, 2018 at 8:29 Comment(0)
C
0

If your sklearn version is less than .19 then "fetch_mldata" will not work. You need to upgrade sklearn to version .23

Celsacelsius answered 28/7, 2020 at 16:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.