How to download MNIST images as PNGs

Asked 7/3, 2019 at 17:17 Answered 12/4, 2023 at 8:7

python database mnist

I want to download the MNIST images to my computer as PNG files.

I found this page: http://yann.lecun.com/exdb/mnist/

After I pressed: train-images-idx3-ubyte.gz: training set images (9912422 bytes)

Please let me know if you have any ideas or suggestions. Thank you!

Commodious answered 7/3, 2019 at 17:17 Comment(1)

This is far too broad/vague, and possibly off-topic. Please see How to Ask, help center. – Tosspot 10/3, 2020 at 19:31

You need to unzip these particular files in order to use them. A better way of doing it would be:

Download via:

curl -O http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

Download to a particular path:

curl -O target/path/filename URL

Unzip the downloaded gzip archives:

gunzip t*-ubyte.gz

For further processing of data see the documentation

import gzip
f = gzip.open('train-images-idx3-ubyte.gz','r')

image_size = 28
num_images = 5

import numpy as np
import matplotlib.pyplot as plt

f.read(16)
buf = f.read(image_size * image_size * num_images)
data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
data = data.reshape(num_images, image_size, image_size, 1)
image = np.asarray(data[2]).squeeze()
plt.imshow(image)

For extracting image see here

Update

Try this link to simply download and expand .gz files

Pannikin answered 7/3, 2019 at 17:39 Comment(11)

Thank you for the quick response. After I unzip the .gz file, how do I get image files. The link for the documentation only shows how to get them into a NumPY arrays and CSV files. – Commodious 7/3, 2019 at 17:42

I will give it a try. What does the attached script do? – Commodious 7/3, 2019 at 19:13

I ran the script, and got this error: Traceback (most recent call last): File "mnist.py", line 10, in <module> new.read(16) NameError: name 'new' is not defined – Commodious 7/3, 2019 at 19:18

Edited pls check now. The attached script will print the image fetched from gz file – Pannikin 8/3, 2019 at 2:31

Everything worked (just needed to add plt.show()) BUT I don't see how the link for downloading the image was helpful for downloading the image. Can you provide some code on how to download the image to a directory? I appreciate all of the help! – Commodious 8/3, 2019 at 6:33

That calls for a different question @NikolasIoannou. However, you can use curl -o target/path/filename URL to download in a particular path. – Pannikin 8/3, 2019 at 6:44

The question is titled: How to download MNIST images locally. Does that not mean download images to a local directory? Anyway, do you have any tips on how to do that? – Commodious 8/3, 2019 at 6:45

Updated my answer @NikolasIoannou – Pannikin 8/3, 2019 at 6:47

While I appreciate the help, I am not sure that we are talking about the same thing. What I want to do is download the images in the MNIST dataset onto my computer. I want to store all the png photos of integers in the MNIST dataset on my computer. – Commodious 8/3, 2019 at 6:50

Because you have contributed so much, just edit your answer (because I already posted that), and I will give you the credit! – Commodious 8/3, 2019 at 15:50

Thanks for the insight @NikolasIoannou. Updated my answer. Always happy to help contributors!! – Pannikin 8/3, 2019 at 16:34

-1

Good mass extraction example

https://github.com/myleott/mnist_png had been previously mentioned on a now deleted link-only answer by the OP user11141180. Here are some more details.

https://github.com/myleott/mnist_png/blob/400fe88faba05ae79bbc2107071144e6f1ea2720/convert_mnist_to_png.py contains a good PNG extraction example, licensed under GPL 2.0. Should be easy to adapt to other output formats with a library like Pillow.

They also have a pre-extracted archive at: https://github.com/myleott/mnist_png/blob/master/mnist_png.tar.gz?raw=true

Usage:

wget \
 http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz \
 http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz \
 http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz \
 http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
gunzip --keep *-ubyte.gz
python3 -m pip install pypng==0.20220715.0
./convert_mnist_to_png.py . out

And now out/ contains files such as:

out/training/0/1.png

out/training/0/21.png

out/training/1/3.png

out/training/1/6.png

out/testing/0/10.png

out/testing/0/13.png

convert_mnist_to_png.py

#!/usr/bin/env python

import os
import struct
import sys

from array import array
from os import path

import png

# source: http://abel.ee.ucla.edu/cvxopt/_downloads/mnist.py
def read(dataset = "training", path = "."):
    if dataset is "training":
        fname_img = os.path.join(path, 'train-images-idx3-ubyte')
        fname_lbl = os.path.join(path, 'train-labels-idx1-ubyte')
    elif dataset is "testing":
        fname_img = os.path.join(path, 't10k-images-idx3-ubyte')
        fname_lbl = os.path.join(path, 't10k-labels-idx1-ubyte')
    else:
        raise ValueError("dataset must be 'testing' or 'training'")

    flbl = open(fname_lbl, 'rb')
    magic_nr, size = struct.unpack(">II", flbl.read(8))
    lbl = array("b", flbl.read())
    flbl.close()

    fimg = open(fname_img, 'rb')
    magic_nr, size, rows, cols = struct.unpack(">IIII", fimg.read(16))
    img = array("B", fimg.read())
    fimg.close()

    return lbl, img, size, rows, cols

def write_dataset(labels, data, size, rows, cols, output_dir):
    # create output directories
    output_dirs = [
        path.join(output_dir, str(i))
        for i in range(10)
    ]
    for dir in output_dirs:
        if not path.exists(dir):
            os.makedirs(dir)

    # write data
    for (i, label) in enumerate(labels):
        output_filename = path.join(output_dirs[label], str(i) + ".png")
        print("writing " + output_filename)
        with open(output_filename, "wb") as h:
            w = png.Writer(cols, rows, greyscale=True)
            data_i = [
                data[ (i*rows*cols + j*cols) : (i*rows*cols + (j+1)*cols) ]
                for j in range(rows)
            ]
            w.write(h, data_i)

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("usage: {0} <input_path> <output_path>".format(sys.argv[0]))
        sys.exit()

    input_path = sys.argv[1]
    output_path = sys.argv[2]

    for dataset in ["training", "testing"]:
        labels, data, size, rows, cols = read(dataset, input_path)
        write_dataset(labels, data, size, rows, cols,
                      path.join(output_path, dataset))

Inspecting the generated PNGs with:

identify out/testing/0/10.png

gives:

out/testing/0/10.png PNG 28x28 28x28+0+0 8-bit Gray 256c 272B 0.000u 0:00.000

so they appear to be Grayscale and 8-bit, and therefore should faithfully represent the original data.

Tested on Ubuntu 22.10.

Loggerhead answered 12/4, 2023 at 8:7 Comment(2)

ModuleNotFoundError: No module named 'pypng' pip3 install pypng – Trembly 13/4, 2023 at 17:40

@AmarnathR works for me on Ubuntu 22.10, retested on clean virtualenv with python3 -m pip install pypng==0.20220715.0. Also visible at: pypi.org/project/pypng/0.20220715.0 – Loggerhead 13/4, 2023 at 18:16

Recommended topics

Hot tags